Getting data from the web is possible with Power Query in both Excel and Power BI. However, Excel is a little behind in Power Query features, therefore in this post I’m going to show you a new feature available in Power BI Desktop using Power Query to get data from the web by example.
In a few months we should see this feature available in Power Query for Excel.
Getting data using ‘Web by Example’ is super handy for data on web pages that aren’t structured in HTML tables, or where the structure is messy, like this book catalogue site, WorldCat:
From this site I want to extract a list of books by, or about author H J Eysenck. The website allows me to enter a search term that filters the data. You can see in the image above it has returned 21,610 results with the first page showing just 10.
Here is the URL if you want to take a look at the page: https://www.worldcat.org/search?q=Eysenck&fq=&dblist=638&start=1&qt=previous_page
The first challenge is getting the following data from the web page:
- Book Title
It’s a challenge because the data on the page is not formatted in a proper HTML table that Power Query can easily find.
Power Query Get Data from Web by Example
Step 1: Connect to the web page
Let’s start in Power BI Desktop - Home tab > Get Data > Web:
Step 2: Enter the URL
At the ‘From Web’ dialog box, enter the web page URL:
Tip: At the privacy setting choose ‘Anonymous’.
Step 3: Provide Examples
At the Navigator dialog box select ‘Add table using examples’:
Then in the bottom half of the window enter in the column labels you want and provide examples until Power Query detects the pattern:
Step 4: Edit or Load
The navigator dialog box will now have a Custom Table and you can click ‘Load’ if you’re ready to load it to your model, or ‘Edit’ to perform further transformations to the data in the Power Query editor:
The result is a table containing the columns you specified:
I know what you’re thinking…and no, you can’t copy the M code from the Advanced Editor in Power BI Desktop and paste it into Excel because Excel doesn’t yet recognise the functions Web.BrowserContents and Html.Table. Excel’s Power Query typically sees updates a few months after they are ‘generally available’ in Power BI Desktop.
The Second Challenge – Getting Data from Multiple Web Pages
Remember the first web page only returned 10 results. There were a further 21,600 results available, spread over a further 2,160 pages! In the next tutorial we’ll look at how we can automate pagination to get all results.
Download Power BI Desktop File
Enter your email address below to download the sample workbook.