June 29, 2021
Hi Power Query Pros,
I’m attempting to import and combine my order forecast, demand, and actual shipments in order to analyze and compare all requirements/data and better manage component parts. The problem I’m facing is that my forecast and demand are imported on a pdf file via our EDI connection with the customer (I can pull the actuals into an excel or CSV file with little to no cleanup needed from my ERP system). My goal would be to setup a folder for each the forecast and demand where I drop each file type and have the data automatically update and dynamically add requirements/forecast. Due to the layout of the order, I’m having a ton of trouble transforming the files into a usable data structure. As you can see on the sample file, the header information will be needed because this is where the repeating info for each order is found in two vertical header columns (part number, customer, ship-to, date of order release, etc.) BUUUUTTT, the actual order line info (due date, qty ordered, F/P order, etc) is formatted below all of the header information (with another layer of horizontal headers) and in offset columns. One of these files typically can have 60-200 pages, repeating the format of all the parts header info followed by the actual line due dates and qtys. After connecting in the folder and pulling in the files, I can’t even get the sample transformation file formatted to where headers are headers and rows are rows because of all of the multiple layers and complexity of the layout and cleanup. I’ve attempted separating into more columns, merging the horizontal headers together in order to transpose them before splitting again, pivoting/unpivoting, and watching YouTube videos and reading tutorials for HOURS! The other option I can see, but am unsure how to actually combine back together with different parameters is transforming each table type (the files import as both full pages and individual tables since it’s pdf). I’m just too inexperienced with Power Query (pretty new user) to understand how each type of table can be formatted, have different M code, and then combine them all back together using the different parameters to connect the same pdf. I’ve spend 12-16+++ hours on this one file! ANY help figuring this out is beyond appreciated so I can actually move to the modeling and analysis!!!
June 29, 2021
I forgot to add, even though some of the header sections may be blank on this sample, most customers do have something in every section. This particular customer only provides the bare minimum though. Some of those header categories repeat, and I will not need all of them in my data, just the most important. I also added a sample of the demand/shipping schedule layout. It’s similar but slightly different. I will be removing the the ASN/Last Shipment and Fab Qty tables on this format. But, in transforming the data, they do provide another layer of headers/columns when pulling the entire page into the sample transformation.
July 16, 2010
Welcome to our forum!
I think you're on the right track in trying to import this data in two queries, one for the header information and one for the transaction data. You can then bring the header information back into the transaction data query as separate columns containing the key information for each.
In the attached file I've appended the tables of data in the PDF in the Transactions query, and then in a separate query called Headers I've extracted the header info. I've then brought that info into the Transactions query. See file attached.
Note: you'll need to edit the connection to point at the PDF on your PC. You can do this in the Power Query editor via the Home tab > Data source settings > Change Source.
Also, you'll need to rename the 'custom.n' headers and probably clean them up a bit more, but hopefully you'll get the idea.
P.S. In future, please help us to read your questions by adding paragraphs. It's very difficult to read a solid block of text. I'd rather have too many paragraphs than none.
June 29, 2021
Thank you for your response and help on this. I’m going to hop on excel and see if I can work through your solution now.
Every time I would delve into the multiple table scenario I kept coming across your “Get Multiple Files Containing Multiple Sheets” Video, but I think I was getting stuck on the part when you open the whole table up given that my data is not in excel workbook sheets and isn’t a clean layout.
The part I’m still confused about is how to get Power Query to recognize to run one transformation on certain parts of the document and another transformation on the other parts; and then combining them to run all on the same document over and over. I also looked into header mapping (which I think is what your explaining above), but couldn’t get it to recognize the headers. Probably because of errors in my code, maybe? I’ll with your file and see how I fare.
I definitely think I would benefit from your courses. I’m beginning to realize I probably don’t have the fundamental knowledge I need. I’ll be looking into that for a long term solution, since a good deal of my job is spent working with data that could benefit from Power Query.