I've written about web scraping before using VBA, but never about interacting with the website to do things like filling in forms.
We had a forum question asking how to do exactly that so I tried using the same approach as I had previously with the HTML Object Library, but when it came to grabbing elements from the web page, the results were inconsistent. Sometimes I'd get what I wanted, sometimes not.
Then I remembered Selenium which is software that automates browsers.
You can write code that instructs Selenium to do things like open a web page, fill in a form, or click a button, and it's really easy to use.
To use it with Excel you need to use the SeleniumBasic library which allows you to drive Selenium with VBA.
SeleniumBasic supports a smaller range of browsers than the full Selenium implementation, I chose to use Chrome. Please note that up to date versions of Firefox are not supported, and you will need to downgrade to a much older version of FF if you really want to use it.
Setting Up SeleniumBasic and Chrome
- Download and install the latest release of SeleniumBasic
- Download and install ChromeDriver
- Create a reference in your VBA project to the SeleniumBasic library
You can download SeleniumBasic from GitHub.
Run the .exe file and follow the instructions.
You need to install the version of ChromeDriver that matches your installed version of Chrome.
To check what version of Chrome you have installed, start it, and then in the address bar of a tab type chrome://settings/help and hit the Enter key.
I have version 80.0.3987.87 installed so I need version 80 of ChromeDriver.
Get the version of ChromeDriver that matches your version of Chrome from here.
You just need to match the major version number i.e 80, 79 etc
Clicking one of the links shown in the above image takes you to a high-tech looking web page.
Download and unzip the version that works with your operating system. I'm running Windows 10 Pro 64-bit so the win32 file is the one I want.
Unzipping will give you just one file chromedriver.exe, you need to copy this to the folder where SeleniumBasic was installed. In doing this you will be copying over the version of chromedriver.exe that came with SeleniumBasic.
In my case the installation location was C:\Users\pgt\AppData\Local\SeleniumBasic. It will be different for you and will depend on what version of Windows (or other OS) you are using.
Creating a Reference to the SeleniumBasic Library
With your new Excel workbook open go into the VBA editor (ALT+F11) and select the workbook.
From the menu, go to Tools->References, find Selenium Type Library and check the box beside it. Then click the OK button.
You can now write VBA that accesses Selenium.
Practical Use - Filling in Forms and Looking Up Multiple Records
Let's say we need to look up VAT (Value Added Tax) records for companies from Great Britain. We can do this from this website http://ec.europa.eu/taxation_customs/vies/
Choose a Member State from the dropdown, so 'GB-United Kingdom' in this instance.
Then enter the VAT Number, and click on Verify.
The result gives us various information, but we are only interested in the company name, and need to extract that from the web page and put it into Excel.
We have a list of VAT numbers on our sheet and want to automate the process of looking up the company name associated with each one.
Interacting With A Web Page
In order to do something like fill in a form or extract data from a web page, we need to understand the structure of the web page.
We can use the Developer Tools in the browser to inspect the underlying HTML that constructs the page.
In Chrome you can either type CTRL+SHIFT+I or right click on the page and click on Inspect. Other browsers are similar, right click on the web page in those browsers to bring up the menu that will give you access to the Dev Tools/Inspector.
This will open a pane at the bottom of the browser like this
As we are interested in entering the Member State and VAT Number we need to know how to find those things in the HTML.
If you put your mouse pointer over the Member State dropdown and right click, then click Inspect, the Inspector window will highlight the HTML that creates this dropdown.
Notice that the highlighted element has an id countryCombobox. We'll need that later.
If we right click on the top most VAT number box, and Inspect that you'll find it has an id called number. We'll remember that for later too.
The last thing we need to look at on this page is the Verify button. Right click it and Inspect and you'll see it has an id in HTML called submit. Store that id for later use too.
If we now look at the results page, we are interested in the company name. Right click it, Inspect and you'll find where it is located in the HTML.
Notice here that the element storing the company name doesn't have an id, so we'll have to use another way to identify it in our VBA so we can get the name into Excel.
We can do this using a method called FindElementByXPath.
Read more about using XPath in Selenium.
What this allows you to do is to refer to a HTML element based on its position within the HTML. You can use various tag names or attributes to indicate what piece of information you want.
In this case, we are after some text in a HTML table. Inside the table is a tag called <tbody>, which contains several <tr> tags.
The company name we want is located in the 2nd <td> tag in the 6th <tr> tag.
We can describe the location of the company name like this //table/tbody/tr/td. Note that there is only 1 table in the results page HTML. If there were more I'd have to specify which table I wanted by using, for example, table etc.
Now we know how to access the parts of the HTML pages we need, we can write our VBA.
VBA To Drive Selenium
Start by declaring and creating the Selenium driver.
I'm using a WHILE loop to work my way down Column A and read off the VAT numbers one by one. I'm using a variable called count to keep track of what row I'm on.
When the cell has 0 length (there's no VAT number in it), the WHILE loop ends.
Next the code tells Selenium to Get (load) the website.
The next 3 lines use a method called FindElementById to interact with the bits of the web page we found earlier using the Inspector.
Using the SendKeys method you can send keystrokes or text to the selected element.
So I send the country code GB, then the VAT number read from Column A of the sheet.
Finally the code Clicks on the submit element (the Verify button).
The results page will now load and we can get the company name with this line, and store it in a cell in Column B. So if count has reached 3, we store the company name in B3.
Repeat the process until all the VAT numbers have been checked.
Once you get everything installed, using Selenium is pretty easy. I've used it to fill in just a couple of things in a form, but it would not be hard to expand on this to complete more complicated forms, even over several pages.