May 19, 2022
I have a PDF file imported to Power Query. The tables will not work as there are tables on every page. It reads better importing the pages. There could be anything up 30 pages. I can tidy each page by highlighting 1st column, hiding null rows and then making 1st row as header. I can do this for each page and them append them all. I just want to know if I can create 2 measures or (Macro) in Power Query editor to do this for each page. 1 to hide null rows and 2 to make 1st row as header row on each page. I am attaching a sample, only 2 pages but there would be many more. If not possible no worries.
Moderators
January 31, 2022
I would create a function that picks-up the available page from the PDF.
Begin with finding out how many pages your PDF has and create a table in Excel (or extracted from the PDF) with a sequence from Page001 to let's say Page005.
First write a query that does all the transfrmation you need. Now edit that query as indicated below. The bolded text is what you need to add/edit. Let's call this function fxGetPages.
(PageNr)=>
let
Source = Pdf.Tables(File.Contents("C:\Users\.....\EFT 17.06.pdf"), [Implementation="1.3"]),
Page = Source{[Id=PageNr]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Page, [PromoteAllScalars=true])
in
#"Promoted Headers"
Now go the table with the Page numbers. Add a column by invoking a custom function. Select fxGetPages and select the column that contains the page numbers. Expand the added column with the tables and you'll see that all transformation have been done to all pages in your PDF.
The attached workbook contains an example that only promotes the headers and should work on your system as it points to the PDF on your system. Come back here if you get stuck.
1 Guest(s)