Common VBA Methods & Properties used in web automation

In the previous post we looked at the Intellisense contextual helper menu, which lists the myriad Methods and Properties available to the InternetExplorer control object, as well as HTML objects and other objects in VBA.

Did you feel overwhelmed by all those unknown properties and methods? I sure do! But the good news is we don’t have to learn all those. Out of those thousands of choices, we only use a small handful in most of our web-scraping and VBA web-automation tasks.

Often-used VBA HTML object Properties

Properties – HTML example 1

Consider the following snippet of HTML (from this site’s footer):

<div class="site-info">
    <span class="site-title"><a rel="home" href="http://automatetheweb.net/">Automate the Web</a></span>
    <a href="https://wordpress.org/">Proudly powered by WordPress</a>
</div>

We can have VBA snag the div element like this (ele is just a variable name I’m making up):
Set ele = objIE.document.getElementsByClassName("site-info")(0)

Read it like this: "for all elements on the web page that have a class of ‘site-info’ get me the zero-ith (first) one, and attach it to the object variable ele".

 

Remember, once VBA runs that line of code it finds a div element with the class name site-info… then it tucks that div into the variable ele for later use…

… so then we can look at some of that div’s properties in the debug console like this…

Debug.Print ele.ClassName would return:

site-info

 

Debug.Print ele.TagName

div

 

Debug.Print ele.Id

 

(nothing … this element has no id=”” attached to it)

 

Debug.Print ele.href

ERROR: Object does not support this method or property

(silly rabbit, div’s don’t have href’s!)

 

Debug.Print ele.innerText

Automate the Web
Proudly powered by WordPress

(this can also be written as Debug.Print ele.textContent)

 

Debug.Print ele.innerHTML

<span class="site-title"><a rel="home" href="http://automatetheweb.net/">Automate the Web</a></span>
<a href="https://wordpress.org/">Proudly powered by WordPress</a>

Debug.Print ele.outerHTML

<div class="site-info">
    <span class="site-title"><a rel="home" href="http://automatetheweb.net/">Automate the Web</a></span>
    <a href="https://wordpress.org/">Proudly powered by WordPress</a>
</div>

 

Properties – HTML example 2

Now let’s look at some Properties of a different element. In the upper-right of this page there’s a blue email SUBSCRIBE button. If we right-click it, then click Inspect, we see this HTML:

<input tabindex="923" type="submit" class="button btn btn-primary" name="submit" 
value="subscribe" id="newsletters-2-button">

So let’s tell VBA to scrape that input element using the following line of code (iEle is just a made up variable name) …
Set iEle = objIE.document.getElementId("newsletters-2-button")

Read like this: "for the lone element on the page that has an id of ‘newsletters-2-button’, attach it to the object variable iEle".

 

When VBA runs that line of code it discovers that particular id belongs to an input element, and it holds this element in memory in the variable iEle.

So executing Debug.Print iEle.Id would produce:

newsletters-2-button

 

Debug.Print iEle.Type

submit

 

Debug.Print iEle.Name

submit

 

Debug.Print iEle.Value

subscribe

.Value is an example of a property that we can set as well as get, meaning it’s not a read-only property. We could assign it a value (write to it) as well, using code like the following, which enters a string of text into the input box:
objIE.document.getElementId("newsletters-2-button").Value = "hi@automatetheweb.net"

 

Properties – HTML example 3

<div class="header-image">
    <a href="http://automatetheweb.net/" rel="home">
        <img src="http://automatetheweb.net/wp-content/uploads/login-logo.jpg" width="350" height="40" alt="Automate the Web">
    </a>
</div>

This time let’s just access some elements directly — we won’t use a variable…

Debug.Print objIE.document.getElementsByTagName("a")(7).href
(assumes it’s the 8th <a> tag on the page we’re after… remember 1 is 0 in variable land, 2 is 1, 3 is 2, and so on)

http://automatetheweb.net/
 

Debug.Print objIE.document.getElementsByTagName("img")(3).src
(assumes it’s the 4th <img> tag on the page we’re after)

http://automatetheweb.net/wp-content/uploads/login-logo.jpg

 

Often-used VBA HTML object Methods

In addition to those amazing and exciting object Properties (think attributes, or nouns), VBA objects can have Methods (think actions, or verbs), too:

We use Methods like these to interact withn HTML objects:

objIE.document.getElementsByTagName("button")(0).click
clicks the first button element on the page.

objIE.document.getElementsByTagName("form")(0).submit
submits the first form on the page.

And we use Methods with the InternetExplorer object like these:

Assume we have
Dim objIE As InternetExplorer
Set objIE = New InternetExplorer

Then…

objIE.navigate "http://google.com"
navigates the browswer

objIE.stop
manually stops browser navigation.

objIE.refresh
performs a browser refresh.

objIE.quit
closes the IE browser.

But the real web-scraping fun happens when using the following powerhouse Methods with InternetExplorer to hunt down (sometimes-illusive) web page elements. These VBA/IE Methods are powerful web-scraping allies that don’t just walk the DOM, they stomp all over it! …

getElementById   video with example use
getElementsByName
getElementsByTagName   video with example use
getElementsByClassName   video with example use

And to work alongside those go-getters, we employ these sophisticated, relationship-bound DOM-walkers:

Children
firstElementChild
nextElementChild
lastElementChild
nextElementSibling
lastElementSibling
previousElementSibling
parentElement

These family-oriented methods rely on the predictable, cohesive and hierarchical structure of the DOM to pinpoint and hook onto any element(s) on a web page.

 

Chaining multiple Properties and Methods to walk the DOM

So what is the DOM? Think of the Document Object Model as a complete and perfect snapshot of the entire loaded web page, stored in VBA as a tree-like structure. A page’s DOM is completely known and understood by VBA once a page has been successfully loaded. Study the image below, replacing the word jQuery with VBA

VBA and the Document Object Model of a web page
VBA and the Document Object Model of a web page
image source: this awesome site
 

In order to scrape web page data, we just need to direct VBA how and where within the DOM to access it. This is sometimes easier said then done, and questions with "how do I get to this data…" are the most common ones I hear. But accessing and interacting with a specific element or elements on a web page with VBA is always possible, and becomes increasingly easy to do with practice and learning to read the DOM (see video links above).

The most common technique when using VBA to find and interact with web page elements is using a method I call chaining (well, I probably read it somewhere) where we employ a combination of one or more of the afformentioned getters with one or more of the family-oriented methods.

Chaining VBA Properties and Methods – example 1

Considering the image above, let’s figure out how to get the text within the li element with the class name "fish". It’s important to note there are many ways to get there, just as there are different ways to climb a tree and still get to the top. Some of these ways are more efficient than others.

Here’s the most direct way that I see: objIE.document.getElementsByClassName("fish")(0).innerText.

And that route would work even if all those bottom-row li‘s had the class "fish", since (0) represents the first one.

But what if that li had no class? Then we could use: objIE.document.getElementsByClassName("menu_list")(0).Children(0).innerText

Here you can see we’ve used dots (.) to chain together the innerText property of the Children method of the getElementsByClassName method of the document method of the InternetExplorer object.

Chaining VBA Properties and Methods – example 2

Let’s consider another example, using a previous code sample, of chaining/connecting together multiple InternetExplorer and HTML methods and properties.

<div id="hdimg" class="header-image">
    <a href="http://automatetheweb.net/" rel="home">
        <img src="http://automatetheweb.net/wp-content/uploads/login-logo.jpg" width="350" height="40" alt="Automate the Web">
        <div><span style="font-size: 90%;">a div of text I added, just for giggles</span></div>
    </a>
</div>

Let’s assume
Dim objIE As InternetExplorer
Dim ele as Object
Set objIE = New InternetExplorer

Now let’s come up with a variety of ways to scrape out the line of text with the word "giggles" and print it in the debug console. Each of the following VBA examples prints "a div of text I added, just for giggles".

  • Debug.Print objIE.document.getElementById("hdimg").innerText
  • Debug.Print objIE.document.getElementsByClassName("header-image")(0).innerText
  • Debug.Print objIE.document.getElementsByClassName("header-image")(0).getElementsByTagName("span")(0).innerText
  • Debug.Print objIE.document.getElementsByClassName("header-image")(0).getElementsByTagName("img")(0).nextElementSibling.innerText
  • Debug.Print objIE.document.getElementsByClassName("header-image")(0).getElementsByTagName("img")(0).nextElementSibling.Children(0).innerText
  • Debug.Print objIE.document.getElementsByClassName("header-image")(0).getElementsByTagName("img")(0).nextElementSibling.getElementsByTagName("span").innerText
  • For each ele in objIE.document.getElementsByTagName("span")
       If instr(ele.style, "ize: 9") > 0 then
          Debug.print ele.innerText: Exit For
       End If
    Next

While all of these VBA code examples accomplish the same thing, clearly the first is the most obvious and practical, while the remainder are merely illustrative and increasingly obscure (even a little ridiculous).

Conclusion

The InternetExplorer and HTML Objects reference libraries include hundreds of Properties and Methods, but we really only use a few of them in most of our VBA web-automation projects.

Common Properies include:
innerText, innerHTML, outerHTML, value, href, src, name, id, className, tagName, and a small handful of others.

Common Methods include:
click, submit, navigate, stop, refresh, quit, getElementById, getElementsByName, getElementsByTagName, getElementsByClassName, Children, firstElementChild, nextElementChild, lastElementChild, nextElementSibling, lastElementSibling, previousElementSibling, parentElement, and a small handful of others.

Chaining together InternetExplorer and HTML Properties and Methods lets us walk anywhere along the DOM, interacting with a web page and its elements, allowing us to scrape data from and automate the web.

 

Facebooktwittergoogle_plusredditlinkedintumblrFacebooktwittergoogle_plusredditlinkedintumblr

VBA Intellisense and Early-binding vs Late-binding

Generally speaking, you want to use early binding, rather than late binding.

Early binding an object variable tells VBA specifically what we’ll be doing with it.
Late binding is generically declaring the variable an Object and VBA doesn’t know how we intend to use it.

 
VBA Intellisense helper menu
VBA Intellisense helper menu

Example of Early binding:
   Dim ie As InternetExplorer
   Set ie = New InternetExplorer

Advantages:
1) you get to use Intellisense
2) code runs faster
3) full access to object model
4) better error-checking

 

Example of Late binding:
   Dim ie As Object
   Set ie = CreateObject("InternetExplorer.Application")

Advantages:
1) can use variables for multiple purposes
2) better backwards compatability with previous VBA versions

 

I almost always perfer early-binding, having a clear advantage over late-binding. One exception, however, is that I’ll often declare a reuseable element object generically (Dim ele as Object) so that I can use it for multiple purposes. Like the following example where I use ele to capture an img element and then a div element:

Dim ele as Object unspecified Object variable, not bound to any particular use
Set ele = ie.document.getElementsByTagName("img")(3)
Debug.Print ele.src I just printed the source of an img!
For each ele in ie.document.getElementsByClassName("ytube")
    Debug.print ele.textContent I just printed the text of a div!
Next
 

But aside from mere convenience, there’s really no good reason to do it this way. Hence my advice to pretty much always use early binding.

 
Facebooktwittergoogle_plusredditlinkedintumblrFacebooktwittergoogle_plusredditlinkedintumblr

Excel – Enable macros or ActiveX controls by default

If you get a "Security Warning – Macros have been disabled" message in Excel, like the following…

Excel - 'Security Warning macros have been disabled'
Excel – ‘Security Warning macros have been disabled’

… just click Enable Content and Excel will remember your choice, and the next time you open this file you won’t get the warning.

However, if you rename the file, or if perhaps you have VBA save copies of the file with modified file names, you’ll see the warning again for each new file-name. This could be problematic if you later need VBA to dynamically open these files for editing (the code would get stopped in its tracks!).

In this case you would just need to set Excel to trust all macros for all files, like so…

File > Options > Trust Center > Trust Center Settings > Macro Settings > click Enable all macros

Excel Trust Center - Enable macros by default
Excel Trust Center – Enable macros by default

You would only tick Trust access to the VBA project object model if you’re creating sophisticated VBA code specifically designed to modify the code within your project’s Subroutines and Functions.

Occasionally you’ll come across (or write code for) an Excel workbook where ActiveX objects are used — usually form controls. Similar to above, if you need a global solution, check Enable all controls without restrictions and without prompting.

Excel Trust Center - Enable ActiveX controls by default
Excel Trust Center – Enable ActiveX controls by default
Facebooktwittergoogle_plusredditlinkedintumblrFacebooktwittergoogle_plusredditlinkedintumblr

reddit Q&A – How to extract data from HTML into Excel

reddit - How to extract HTML
reddit – How to extract HTML

Here’s a question from redditor Sajano90:

… need to know how i can create a list in excel with every product by its name and price (from) www.mindfactory.de
maybe for the start its not needed to be every product but maybe all the cases or all the lcd monitors or something like this.
i think its easiest to take the url from the special categorie of the produkt, and than try to take the first page with its 20 products and find the elements with the name and the price and list them along to another in an excel table.

The answer I gave was a small adaptation of SearchBot(), the beginner web scraping tutorial on the home page.

Under Tools > References… add these reference libraries: Microsoft HTML Object Library and Microsoft Internet Controls.

Sub GetData()
 
    Dim objIE As InternetExplorer
    Dim itemEle As Object
    Dim desc As String, pt1 As String, pt2 As String, price As String
    Dim y As Integer
 
    Set objIE = New InternetExplorer
    objIE.Visible = True
 
    objIE.navigate "http://www.mindfactory.de/Hardware/Monitore+(TFT).html"
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
 
    y = 1
 
    For Each itemEle In objIE.document.getElementsByClassName("item_grid_2_cols")
        desc = itemEle.getElementsByTagName("h3")(0).innerText
        pt1 = itemEle.getElementsByClassName("prod_preis")(0).getElementsByTagName("a")(0).textContent
        pt2 = itemEle.getElementsByClassName("prod_preis")(0).getElementsByTagName("a")(1).textContent
        price = pt1 & pt2
        Sheets("Sheet1").Range("A" & y).Value = desc
        Sheets("Sheet1").Range("B" & y).Value = price
        y = y + 1
    Next
 
End Sub

 

reddit Q&A - How to extract HTML, Excel result
reddit Q&A – How to extract HTML, Excel result

Finding the solution was fun because the layout of the target website was very convoluted, requiring the price ‘dollars’ and ‘cents’ to be initially captured in two different variables.

To download this Excel VBA project: reddit – HTML scrape mindfactory.xlsm

And here’s a similar reddit Q&A: Grabbing an HTML table and inserting the cells into Excel?

 
Facebooktwittergoogle_plusredditlinkedintumblrFacebooktwittergoogle_plusredditlinkedintumblr

Launch VBA code macro from a GO or RUN button

To add a GO or RUN button to an Excel sheet to start a VBA macro/subroutine…

Decide where on the sheet to place the button. I’ll make some room along the top the row…

Add 'GO' or 'RUN' button to start Excel VBA macro
Add ‘GO’ or ‘RUN’ button to start Excel VBA macro

Click the Insert tab > Illustrations section > Shapes, and choose a shape…

Insert shape onto Excel sheet
Insert shape onto Excel sheet

Position the shape where and how you want it…

Position shape on Excel sheet
Position shape on Excel sheet

Right-click the shape and click Edit text. Enter GO, RUN, START, or whatever you want the button to say…”

Right-click shape and 'Edit text'
Right-click shape and ‘Edit text’

Right-click the shape and click Assign Macro…, then choose which macro will start when the shape is clicked…

Right-click and 'assign macro' to the shape
Right-click and ‘assign macro’ to the shape

Choose the macro and click OK
Choose the macro and click OK

That’s it — just click the shape whenever you want to start your macro or chain of macros…

Click shape to run your VBA macro/subroutine
Click shape to run your VBA macro/subroutine

To download ths Excel VBA project: getelementsbytagname2.xlsm

Facebooktwittergoogle_plusredditlinkedintumblrFacebooktwittergoogle_plusredditlinkedintumblr

How to add VBA References – Internet Controls, HTML Object Library

In order to control the Internet Explorer web browser, VBA needs 2 additional reference libraries:

Microsoft HTML Object Library   and   Microsoft Internet Controls

 
 

Here’s how to add them…

Open the VBA Editor, and click Tools > References…

Visual Basic Editor - Tools > References
Visual Basic Editor – Tools > References

Make sure these 4 References are active by default:

  • Visual Basic for Applications
  • Microsoft Excel xx.x Object Library
  • OLE Automation
  • Microsoft Office xx.x Object Library
VBA default reference libraries
VBA default reference libraries

Scroll down a tick these 2 web-related References:

  • Microsoft HTML Ojbect Library
  • Microsoft Internet Controls
Add MS HTML Object Library, MS Internet Controls
Add MS HTML Object Library, MS Internet Controls

Resulting in…

VBA - Needed references for building web bots
VBA – Needed references for building web bots
Facebooktwittergoogle_plusredditlinkedintumblrFacebooktwittergoogle_plusredditlinkedintumblr

VBA – Web scraping with getElementsByTagName()

We already looked at getElementByID and getElementsByClassName as VBA methods for hooking onto web page elements.

But when elements (HTML tags) on a page don’t have an ID or class, another common approach for accessing them is using getElementsByTagName.

Examples:

objIE.document.getElementsByTagName("div")

   creates an array (group) of all div elements on a page.

Debug.Print objIE.document.getElementsByTagName("p").Length

   displays number of p (paragraph) elements on a page in the console.

Debug.Print objIE.document.getElementsByTagName("p")(0).innerHTML

   displays inner HTML of 1st p element on a page.

Debug.Print objIE.document.getElementsByTagName("p")(4).textContent

   displays text content of 5th p element on a page.

For Each ele In objIE.document.getElementsByTagName("input")
If ele.Value = "Sign In" then ele.Click: Exit For
Next

   clicks an input element that has a value equal to ‘Sign In’

For Each ele In objIE.document.getElementsByTagName("a")
If InStr(ele.href, "wp-admin") > 0 then objIE.navigate ele.href: Exit For
Next

   gets the link element containing ‘wp-admin’ and navigates to it.

Let’s practice. Let’s say every morning we need to create a report which requires going to a website, copying a bunch of data, and pasting it into Excel, and we’re wanting to automate the task…

I’ll just use a list of last names as example data. Doing this manually would look like…

Example data - a huge list of last names
Example data – a huge list of last names

Paste into Excel…

Pasting data into Excel - tedious!
Pasting data into Excel – tedious!

Not too hard, but many of our ‘Excel + web data’ tasks are more involved. And if they are daily, or otherwise repititive, you could save hours per year by automating them. The Sub below performs our task within seconds … and the code is eternally reusable. Assuming this is a 5-minute task, daily, building an automation super macro would save 30 hours a year!

Paste the code below into a VBA code module (how to open a new VBA module), or download below.

Sub GrabLastNames()
 
    'dimension (set aside memory for) our variables
    Dim objIE As InternetExplorer
    Dim ele As Object
    Dim y As Integer
 
    'start a new browser instance
    Set objIE = New InternetExplorer
    'make browser visible
    objIE.Visible = True
 
    'navigate to page with needed data
    objIE.navigate "http://names.mongabay.com/most_common_surnames.htm"
    'wait for page to load
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
 
    'we will output data to excel, starting on row 1
    y = 1
 
    'look at all the 'tr' elements in the 'table' with id 'myTable',
    'and evaluate each, one at a time, using 'ele' variable
    For Each ele In objIE.document.getElementById("myTable").getElementsByTagName("tr")
        'show the text content of 'tr' element being looked at
        Debug.Print ele.textContent
        'each 'tr' (table row) element contains 4 children ('td') elements
        'put text of 1st 'td' in col A
        Sheets("Sheet1").Range("A" & y).Value = ele.Children(0).textContent
        'put text of 2nd 'td' in col B
        Sheets("Sheet1").Range("B" & y).Value = ele.Children(1).textContent
        'put text of 3rd 'td' in col C
        Sheets("Sheet1").Range("C" & y).Value = ele.Children(2).textContent
        'put text of 4th 'td' in col D
        Sheets("Sheet1").Range("D" & y).Value = ele.Children(3).textContent
        'increment row counter by 1
        y = y + 1
    'repeat until last ele has been evaluated
    Next
 
    'save the Excel workbook
    ActiveWorkbook.Save
 
End Sub

By using getElementsByTagName (especially when getElementByID and getElementsByClassName are not an option) you can scrape data from any web page at the push of a button.

What tasks can you automate?

To download this Excel VBA project: getelementsbytagname.xlsm

Recommended next: reddit Q&A – How to extract data from HTML into Excel

Facebooktwittergoogle_plusredditlinkedintumblrFacebooktwittergoogle_plusredditlinkedintumblr

How to remove your name and other meta info from an Excel file

Excel saves a lot of meta data in your Workbooks by default. Even personal information (like your name!) is stored, which may not always be desirable.

Excel - document properties
Excel – document properties

The best way to remove your name and other info from a workbook is not from within Excel, but from its containing folder in Windows:

Start by right-clicking the Excel file, then click Properties

Right-click Excel file, click Properties
Right-click Excel file, click Properties

In the Properties pane, click the Details tab, then Remove Properties and Personal Information

File Property Details - remove properties and personal info
File Property Details – remove properties and personal info

Next, in the Remove Properties pane, click Remove the following properties from this file, uncheck the box next to Authors and Last saved by (and any other info you want), and click OK

Remove Properties pane
Remove Properties pane

The items you selected are now removed from the file. That is, until you open the file again! Unfortunately, you’ll have to do this each time you open and close the file, as there is no way to prevent the automatic storing of this information.

Additionally, and annoyingly, Excel (and other software) does not allow removal of certain meta data like Date created, Date modified, and Date acessed (notice below, no check boxes next to these items). The Computer name, however, only looks like its being stored; this is actually just showing the computer currently viewing the file properties…

Som file properties can't be removed
Som file properties can’t be removed
Facebooktwittergoogle_plusredditlinkedintumblrFacebooktwittergoogle_plusredditlinkedintumblr

Speed-write VBA code by recording a macro

Sometimes the fastest and easiest way to write a tricky bit of VBA code is to have Excel do it for you, by recording a macro, then tweaking the created code.

For example, in our article about VBA’s getElementsByTagName() method, we wrote a subroutine to automate scraping data from a table on a page by pulling content from the child elements of <tr> tags…

VBA project example
VBA project example

Now let’s say we want to sort the data and make the headings bold. I think the VBA code to make text bold is something like Range(“A1”).Font.Bold, but honestly I don’t remember the exact syntax so I’d usually just take a guess or two then turn to Google to jog my memory. Same with sorting — I’d definitely need some help with that one.

But an easier way, especially for the harder stuff, is to simply record what you want to happen in a macro, then steal the VBA code that Excel created dynamcally during recording.

Let’s try it…

Make sure Excel’s Developer tab is showing and click the Record Macro button…

'Record Macro' button in Dev tab
‘Record Macro’ button in Excel Developer tab

Give your new macro a name…

Name your new Excel macro
Name your new Excel macro

Perform the actions you want to capture code for (in this case, sorting columns)…

Sorting Excel data in 4 clicks
Sorting Excel data in 4 clicks

Click the Stop Recording macro button…

'Stop Recording' macro button in Excel's Dev tab
‘Stop Recording’ macro button in Excel Developer tab

Click the newly-created module in the Visual Basic Editor. We can see a new VBA subroutine was automatically created by Excel when we recorded our macro…

New macro code dynamically-created by Excel in new module
New macro code dynamically-created by Excel in new module

Cut & paste the above macro code into your module, then call it by name in your Subroutine…

Cut & paste macro code to your module
Cut & paste macro code to your module

Now for making the column headings bold … you give that one a try!

To download this Excel VBA project: getelementsbytagname2.xlsm

Facebooktwittergoogle_plusredditlinkedintumblrFacebooktwittergoogle_plusredditlinkedintumblr

How to insert a VBA code Module in Excel’s Visual Basic Editor

To insert a new VBA code module in Excel’s Visual Basic Editor, first open the Visual Basic Editor

Excel - click Visual Basic
Excel – click Visual Basic

Then click View > Project Explorer

VBE - click View, Project Explorer
VBE – click View, Project Explorer

Right-click VBAProject and click Insert > Module

VBE - click Insert, Module
VBE – click Insert, Module

A new, numbered module appears, waiting for your VBA code…

VBA - new Module1 created and waiting for VBA code
VBA – new Module1 created and waiting for VBA code
Facebooktwittergoogle_plusredditlinkedintumblrFacebooktwittergoogle_plusredditlinkedintumblr