Active Member
December 21, 2019
Hello, I am working on improving a Legislative Review project of all Legislative Bills to track the Last Action on each bill. Currently, I have VBA written to pull the Final and Signed pdf documents into a folder for an advanced search of relevant Legislative terms. What I am working on now is the last piece - to pull in the Last Action element (perhaps the title also if I get this working) on each bill so they can be tracked efficiently. I have attached a copy of what I have created to date which is the auto open General Assembly page and the code to ensure it runs once the page is loaded. The remainder of what I have is a hodge podge of what I have pulled together but I did just download the example from Web Scraping Paged Websites. I would really like to get some input into whether extracting one or two elements is possible. I cannot actually find blog out there to extract an element from paged websites. Thanks for any help from all you masterminds! [Image Can Not Be Found]
October 5, 2010
Hi Kai,
You can extract as many elements as you want from a web page. The issue is how easy that is to do, and that depends on how the web page is constructed.
If the web page elements are all given an ID then it is pretty straight forward. As all ID's in a web page should be unique, knowing an ID means you know exactly how to locate the piece of information you want.
Unfortunately not all elements are given ID's (it's not a requirement when making a page) and this is the case for the web site you are looking at. So you need to locate the info you want in another way and that can be by using the CSS class, which is how I've done it for you - see attached file.
In the image you included in your workbook, you'll see that the highlighted line has a section class="bill-last-action search-result-single-item". This element has 2 classes bill-last-action and search-result-single-item. I used bill-last-action to pick out the information you want using this line of code
HTML.getElementsByClassName("bill-last-action")(ResultNum).getElementsByTagName("span")(0)
This is used in a loop where ResultNum is a loop counter going through each of the 25 results per page and the last bit .getElementsByTagName("span")(0) picks out the text in the <span> tags.
My code writes the Last Action to the Immediate Window using Debug.Print so you just need to change that code to do what you want with the text.
I've written a few posts on web scraping which might help you:
https://www.myonlinetraininghu.....g-with-vba
https://www.myonlinetraininghu.....iple-pages
https://www.myonlinetraininghu.....ling-forms
TBH I prefer using Selenium as I find it it is much easier to manipulate the web page and I'd encourage you to give that a go. I find the native VBA syntax a bit difficult to use and not well documented 🙁
Cheers
Phil
1 Guest(s)