• Skip to main content
  • Skip to header right navigation
  • Skip to site footer

My Online Training Hub

Learn Dashboards, Excel, Power BI, Power Query, Power Pivot

  • Courses
  • Pricing
    • Free Courses
    • Power BI Course
    • Excel Power Query Course
    • Power Pivot and DAX Course
    • Excel Dashboard Course
    • Excel PivotTable Course – Quick Start
    • Advanced Excel Formulas Course
    • Excel Expert Advanced Excel Training
    • Excel Tables Course
    • Excel, Word, Outlook
    • Financial Modelling Course
    • Excel PivotTable Course
    • Excel for Customer Service Professionals
    • Excel for Operations Management Course
    • Excel for Decision Making Under Uncertainty Course
    • Excel for Finance Course
    • Excel Analysis ToolPak Course
    • Multi-User Pricing
  • Resources
    • Free Downloads
    • Excel Functions Explained
    • Excel Formulas
    • Excel Add-ins
    • IF Function
      • Excel IF Statement Explained
      • Excel IF AND OR Functions
      • IF Formula Builder
    • Time & Dates in Excel
      • Excel Date & Time
      • Calculating Time in Excel
      • Excel Time Calculation Tricks
      • Excel Date and Time Formatting
    • Excel Keyboard Shortcuts
    • Excel Custom Number Format Guide
    • Pivot Tables Guide
    • VLOOKUP Guide
    • ALT Codes
    • Excel VBA & Macros
    • Excel User Forms
    • VBA String Functions
  • Members
    • Login
    • Password Reset
  • Blog
  • Excel Webinars
  • Excel Forum
    • Register as Forum Member

Extracting URL from the Webpage from multiple HTML tables|Power Query|Excel Forum|My Online Training Hub

You are here: Home / Extracting URL from the Webpage from multiple HTML tables|Power Query|Excel Forum|My Online Training Hub
Avatar
sp_LogInOut Log In sp_Registration Register
sp_Search Search
Advanced Search|Last Search Results
Search
Forum Scope




Match



Forum Options



Minimum search word length is 3 characters - maximum search word length is 84 characters
sp_Search Search
sp_RankInfo
Lost password?
sp_CrumbsHome HomeExcel ForumPower QueryExtracting URL from the Webpage fro…
sp_PrintTopic sp_TopicIcon
Extracting URL from the Webpage from multiple HTML tables
Avatar
Ritabrata Bhattacharya

New Member
Members
Level 0
Forum Posts: 1
Member Since:
September 2, 2022
sp_UserOfflineSmall Offline
1
September 2, 2022 - 4:10 am
sp_Permalink sp_Print sp_EditHistory

Hi,

Here is the scenario in which I got stuck and seeking your valuable advice for the same.

Here is the GitHub link containing category wise public API links:

https://github.com/public-apis.....#test-data

It contains around 51 different categories listed as index at the beginning. As you scroll down the page you would find that each of the categories are presented in the HTML table format.

My objective is to fetch each of the API (URL) under each of the topics along with the other table information and collated in the one single table.

To accomplish the task I have chosen Power Query utility and tried in Office365(Excel) and Power BI Desktop.

The challenge I had faced while executing the task: 

Step1: Using the above GitHub link I had reached until this:

[Image Can Not Be Found]

 

Now if I expand the table then other than the link of each of the API for each category I am unable to capture. Then I tried to utilize the following code snippet posted in Chris Webb's blog as the intermediate step to insert the URL fetching code for each of the API for each of the categories.
Chris Webb's Blog link: Chris Webb's BI Blog: Using Html.Table() To Extract URLs From A Web Page In Power BI/Power Query M C...

And the portion of the code after making relevant modification is:

"Added Custom" = Table.AddColumn(Html.Table(Source, {{"Links", "a[href^=""http""]", each [Attributes][href]}}))

and later on this by following this blog:

Power Query – how to simply get hyperlinks from webpages – Trainings, consultancy, tutorials

#"Added Custom" = Table.AddColumn(Source, {{"API_URL", ":nth-last-child(155) > TBODY > TR > :nth-child(1) > A[rel=""nofollow""]:nth-child(1):nth-last-child(1)", each [Attributes][href]?}}, [RowSelector="TABLE:nth-child(20) > TBODY > TR"])

In either of the cases I could not be able to achieve the desired goal.

 
I had also opted to Add Table using Example option in the Power BI's Power Query Navigator Interface. Here is the screenshot.
 
[Image Can Not Be Found]
 
 
But the problem is it only captures one category that is one HTML table not all the tables.
In this case I was not able to add the URL column with the rest of the datasets to accomplish the task.

Please let me know if any other information you could need to recreate the steps.

 
I am certain that I am making some terrible mistake or overlooking something.
 
Please help me out.
 
Regards
Ritabrata Bhattacharya
Avatar
Catalin Bombea
Iasi, Romania
Admin
Level 10
Forum Posts: 1807
Member Since:
November 8, 2013
sp_UserOfflineSmall Offline
2
September 5, 2022 - 7:02 pm
sp_Permalink sp_Print

Hi Ritabrata,

Impossible to debug what you did without your test file, can you upload a sample file that reproduces the error mentioned?

Avatar
Philip Treacy
Admin
Level 10
Forum Posts: 1514
Member Since:
October 5, 2010
sp_UserOfflineSmall Offline
3
September 6, 2022 - 12:01 pm
sp_Permalink sp_Print

Hi Ritabrata,

Html.Table only works in PBI and it requires a CSS selector to get the data from the web page.  The tables on that GitHub page have no CSS classes to identify them so Html.Table can't be used.

The code in the file below grabs the entire page and then using things like Text.BetweenDelimiters, extracts the following data:

api tableImage Enlarger

Regards

Phil

sp_Feed
Go to top
Forum Timezone: Australia/Brisbane
Most Users Ever Online: 245
Currently Online: Ruth Savage, Andy Kirby, Roy Lutke, Jeff Krueger
Guest(s) 9
Currently Browsing this Page:
1 Guest(s)
Top Posters:
SunnyKow: 1432
Anders Sehlstedt: 870
Purfleet: 412
Frans Visser: 346
David_Ng: 306
lea cohen: 219
A.Maurizio: 202
Jessica Stewart: 202
Aye Mu: 201
jaryszek: 183
Newest Members:
John Chisholm
vexokeb sdfg
John Jack
Malcolm Toy
Ray-Yu Yang
George Shihadeh
Naomi Rumble
Uwe von Gostomski
Jonathan Jones
drsven
Forum Stats:
Groups: 3
Forums: 24
Topics: 6212
Posts: 27236

 

Member Stats:
Guest Posters: 49
Members: 31889
Moderators: 3
Admins: 4
Administrators: Mynda Treacy, Philip Treacy, Catalin Bombea, FT
Moderators: MOTH Support, Velouria, Riny van Eekelen
© Simple:Press —sp_Information

Sidebar

Blog Categories

  • Excel
  • Excel Charts
  • Excel Dashboard
  • Excel Formulas
  • Excel PivotTables
  • Excel Shortcuts
  • Excel VBA
  • General Tips
  • Online Training
  • Outlook
  • Power Apps
  • Power Automate
  • Power BI
  • Power Pivot
  • Power Query
microsoft mvp logo
trustpilot excellent rating
Secured by Sucuri Badge
MyOnlineTrainingHub on YouTube Mynda Treacy on Linked In Mynda Treacy on Instagram Mynda Treacy on Twitter Mynda Treacy on Pinterest MyOnlineTrainingHub on Facebook
 

Company

  • About My Online Training Hub
  • Disclosure Statement
  • Frequently Asked Questions
  • Guarantee
  • Privacy Policy
  • Terms & Conditions
  • Testimonials
  • Become an Affiliate

Support

  • Contact
  • Forum
  • Helpdesk - For Technical Issues

Copyright © 2023 · My Online Training Hub · All Rights Reserved. Microsoft and the Microsoft Office logo are trademarks or registered trademarks of Microsoft Corporation in the United States and/or other countries. Product names, logos, brands, and other trademarks featured or referred to within this website are the property of their respective trademark holders.