Web Scraping Multiple Pages from Websites • My Online Training Hub

My last post looked at web scraping using VBA, where I loaded a single page and extracted content.

In this post I'm looking at loading multiple pages from a site and getting the content I want from each page.

Download Sample Code

Download a copy of the example workbook with the code used in this post.

Enter your email address below to download the workbook.

By submitting your email address you agree that we can email you our Excel newsletter.

Please enter a valid email address.

Download the workbook. Note: This is a .xlsm file. Please ensure your browser doesn't change the file extension on download.

You need to set a reference to the HTML Object Library it in your VBA project before this code will work.

Open your VBA editor (the keyboard shortcut is ALT+F11) and with your workbook selected in the VBAProject window, go to Tools -> References.

Look for Microsoft HTML Object Library and check the box beside it. Then click OK.

How To Load Multiple Pages

By creating an instance of Internet Explorer from VBA, you can navigate to whatever URL you like.

Create IE Object

Navigate to a Page

To access multiple pages I need to find out what the URL structure is for each page.

The URL for the home page is https://www.microsoft.com/en-us/microsoft-365/blog/excel/

The URL for the 2nd page is https://www.microsoft.com/en-us/microsoft-365/blog/excel/page/2/ so to access a page you need to add /page/x/ to the end of the main URL, where x is the page number you want.

Knowing this URL structure it's now easy to write code to page through however many pages you want.

I've written the code so that IE is visible as it loads different pages. It's good to see it working so you know it's doing what it should.

If you want to hide IE, change True to False in this line.

What Are We Extracting From the Pages?

I want to get the post title, and the link to that post. To do this I need to be able to find them in the HTML code that makes up the page.

Right click on a post title and choose Inspect element - or whatever similar wording is your your browser's menu.

This will open your browser's inspection window and will highlight the element you want to inspect - the blog title.

I can see that the blog title is a H3 - a type of heading - with the CSS class "c-heading". This H3 heading contains the <a> link to the blog post itself.

Knowing this I can write my VBA to extract all H3 headings

and then look for those with the class "c-heading".

Then get the <a> link from inside that H3 heading.

Now I write this out to the sheet as a hyperlink, and loop through all the links doing the same thing.

When I get to the last one I load the next page, and repeat the process.

Your sheet should now contain a list of hyperlinks to the blog posts.

Paging Content

Websites that have paged content should have a consistent structure for the URL's on each page, so it's just a case of finding that and then you can write your VBA to load each page.

8 thoughts on “Web Scraping Multiple Pages from Websites”

Hi, thank you for this template. I was wondering whether it is possible to check if a specific text is on a webiste, if so then print found in the coloumn next to the URL in excel. I need to do so for 750++ Websites. Any idea how to do so?

Thank you!

Philip Treacy

October 24, 2019 at 12:02 pm

Hi Tarek,

The short answer is yes this can be done. To give you a better answer I’d need to know what text you are looking for and a sample of websites where it is and isn’t found.

Please open a topic on the forum and supply all of this information and we can answer you there, with a sample workbook.

Regards

Phil
Reply

Thanks for posting this article about webscraping. I have a few questions. So here are the things I have:
A BOM list of parts (part description, part #, etc.) in spreadsheet form

Is it possible to create a VBA tool that looks at that BOM list by sub assembly and puts the BOM part information into a form, and have the VBA tool perform webscraping by finding spec information for each part online? i.e. There is a 8 input/output Turck communication block on the BOM. We want the tool to see that item and automatically do a web search on Turck’s website that looks for that specific part and a few desired specs of that part into a spreadsheet?

Have you ever created a VBA tool that pulls information from SAP?

Philip Treacy

September 8, 2019 at 1:01 pm

Hi Alex,

This sounds like something that could be done. If you can post a qs on the forum and include your file we’ll take a look.

You might want to look at Power Query to get data from SAP.

Phil
Reply

Hi Philip,

I changed the number of pages of the said website to go through in the browser then obtained 687 articles in a total. In addition to the tactic of iterating over multiple pages of web data using power query this is the first time I found another approach by using VBA. Thank you very much for such an amazing and valuable gift.

Philip Treacy

October 25, 2018 at 3:07 pm

You’re welcome Julian.
Reply

Same code if you want to scrape from Edge?

Philip Treacy

October 25, 2018 at 8:42 am

Hi Giorgio,

Edge does not support COM automation so can’t be accessed from VBA. IE is installed on all modern versions of Windows though so you should be able to use that.

Edge can be automated using WebDriver so if you really want to try that check this https://blogs.windows.com/msedgedev/2015/07/23/bringing-automated-testing-to-microsoft-edge-through-webdriver/

Regards

Phil
Reply

Related Topics