May 19, 2021
I am building a Query to access Web Data stored in a PDF.
I will turn this into a Custom Function so that it can accumulate data from many Web held PDFs and then analyse it with Power Pivot.
My problem is that the data arrives in a 8 x 7 matrix with row 1 being column headers and column 1 being row titles, the rest is numerical data.
In order to accumulate and analyse the data I believe I need to transform it to 2 rows x 42 columns.
The attached sheet shows the problem simplified to 4 x 4 which I need to transform to 2 x 9
Where row 1 will be the headers which are common to all the data sets and be a compound of the existing column name and each row title. To give a practical example the data could be garment prices, the columns could be 'shirts', 'ties', 'trousers' and the rows could be 'Gucci', 'Boss', 'M&S' and I'm looking to get a 2-row 9-column matrix with headings:
Gucci-Shirts, Gucci-Ties, Gucci-Trousers, Boss-Shirts... ... M&S-Ties, M&S-Trousers.
Are there any transformations in the Power Query Editor ribbon that will achieve this or, are there functions in M which I can read-up about? Any help would be greatly appreciated.
Many thanks,
Simon,
(Bedford, UK)
Moderators
January 31, 2022
Moderators
January 31, 2022
May 19, 2021
Riny, you are a genius! Thank you very much.
For amusement and as a warning, I attach below the 'solution' from ChatGPT and below that, it's second attempt when I discovered it used a function that doesn't exist...
Certainly, I can provide you with detailed instructions on how to use Power Automate Desktop to interrogate PDF documents on the UK Government's Digital Marketplace and extract 42 cells of data from a table of 7 columns and 8 rows, including the common titles and headings of the rows and columns. Here are the steps you can follow:
1. **Launch Power Automate Desktop**:
- Ensure you have Power Automate Desktop installed on your computer.
2. **Create a New Flow**:
- Open Power Automate Desktop and create a new flow for this specific task.
3. **Load Excel File**:
- Use the "Excel - Load Workbook" action to load your Excel spreadsheet. Set the workbook path to "C:\Users\Name\Desktop\URL.xlsx" and choose the "URLs" table.
4. **Iterate Through URLs**:
- Add a "For Each Row" loop to iterate through the rows of the "URLs" table. This will allow you to process each URL one by one.
5. **Open PDF Document**:
- Within the loop, use the "Web - Open Browser" action to open the URL from the current row in your Excel table. Make sure to navigate to the specific PDF page you want to extract data from.
6. **Extract PDF Data**:
- Use the PDF scraping functionality in Power Automate Desktop to extract data from the PDF table. Here's a high-level overview of how you can do this:
a. Use the "PDF - Start Page" action to specify the PDF page you want to work with.
b. Use the "PDF - Extract Table" action to define the region of the table you want to extract. Specify the coordinates for the top-left and bottom-right corners of the table.
c. Iterate through the table to extract each cell's content. You can use a combination of "PDF - Get Text" actions and variables to store the extracted data.
7. **Store Extracted Data**:
- Create variables or use an array to store the extracted data from each PDF document. You can append the extracted data to this storage for each iteration of the loop.
8. **Repeat for All URLs**:
- Continue the loop until you have processed all the URLs in your Excel table.
9. **Organize Extracted Data**:
- After processing all the PDFs, you will have the extracted data stored in your variables or arrays. Organize this data as needed.
10. **Save or Export Data**:
- You can save the extracted data to a new Excel spreadsheet or any other desired format using the "Excel - Write Range" action or other appropriate actions.
11. **Close PDF Documents and Excel File**:
- Make sure to close all open PDF documents and the Excel file.
12. **End the Flow**:
- End the flow gracefully.
These steps provide a general outline of how to use Power Automate Desktop to achieve your task. You may need to adjust the specifics based on the layout and structure of the PDF documents on the Digital Marketplace. Additionally, testing and debugging the flow as you build it is essential to ensure it works as expected.
Step 4.b: There is not a Combine Columns option on the transform tab
ChatGPT
2 / 2
I apologize for any confusion. You're right; there is no direct "Combine Columns" option in Power Query. To achieve the desired result of combining values from multiple columns into a single column with a separator, you can use the "Custom Column" option. Here's how you can do it:
After you have promoted the first row as headers in your Query Editor:
Select the column containing your header values (the first column after the transposition).
Go to the "Add Column" tab in the Query Editor.
Click on "Custom Column."
In the "Custom Column" dialog that appears, you can enter a formula to combine the values from multiple columns. In your case, you want to combine columns with a "-" separator. You can use the following formula in the dialog:
css
Copy code
= Table.ColumnNames(Source)[0] & "-" & [Column1] & "-" & [Column2] & "-" & [Column3] & "-" & [Column4] & "-" & [Column5] & "-" & [Column6]
Here, "Source" should be replaced with the name of your query if it's different. This formula concatenates the values from the columns with a "-" separator.
Click OK to create the custom column.
This will create a new column with the combined values using the specified separator. You can then proceed with the remaining steps as mentioned earlier to achieve your desired 2 x 42 matrix.
1 Guest(s)