Office 365/Microsoft 365 users can now import data from a PDF to Excel using Power Query*.
It’s super handy for importing data tables inside PDF documents because Excel locates them for you ready to import. And if the data is a little messy you can use the Power Query transform tools to clean it before loading it into the Excel file.
*Only available to Office/Microsoft 365 subscribers, find out how to get the latest updates.
Tip: If you don’t have Office/Microsoft 365 then this same functionality is available in Power BI Desktop.
Watch the Video
Enter your email address below to download the sample workbook.
Import Data from a PDF to Excel Steps
Step 1: Get Data from PDF - It’s as easy as going to the Data tab of the ribbon > Get Data > From File > From PDF:
Step 2: Locate the PDF File you want to import to Excel - Browse to the location the PDF is saved > click Import:
Step 3: Select Tables and Pages in PDF - The Power Query Navigator window opens with a list of pages and tables Excel has identified in the PDF. You can select a table or page to preview in the pane on the right, as shown below with the table on page 7 displayed:
Step 4: Import Data from PDF or Clean - From here you can click the ‘Load’ button to import data directly to the Excel worksheet or to the Data Model. However, you’re most likely to want to do some more cleaning of the data first, in which case click the ‘Transform Data’ button to open the Power Query Editor:
Notice Power Query has correctly identified the column labels and placed them in the header row. If it doesn’t do this automatically, click on the drop down in the top left of the table > Use First Row as Headers:
Not all tables will import as easily. For example, Table2 in this PDF has column headers split over multiple rows. Watch the video to see how to fix them so they’re in a single row and can form the column labels.
When you’re done with tidying up the data, give the query a name in the Properties pane, then go to the Home tab > Close & Load to finalise importing data from the PDF to Excel.
Import Multiple Tables/Pages from PDF
If you have tables that run over multiple pages, for example credit card or bank statements, Power Query will automatically combine then into a single table.
You can also import consecutive pages with the StartPage and EndPage optional parameters like so:
= Pdf.Tables(File.Contents("C:\Sample.pdf"), [StartPage=3, EndPage=5])
More on the Pdf.Tables function here.
Import Multiple PDFs From Folder
If you have multiple PDFs containing data or tables with the same structure, you can use the Power Query Get Data From Folder connector to get them and import the data into a single table.