VBA – Web scraping with getElementsByTagName()

We already looked at getElementByID and getElementsByClassName as VBA methods for hooking onto web page elements.

But when elements (HTML tags) on a page don’t have an ID or class, another common approach for accessing them is using getElementsByTagName.

Examples:

objIE.document.getElementsByTagName("div")

   creates an array (group) of all div elements on a page.

Debug.Print objIE.document.getElementsByTagName("p").Length

   displays number of p (paragraph) elements on a page in the console.

Debug.Print objIE.document.getElementsByTagName("p")(0).innerHTML

   displays inner HTML of 1st p element on a page.

Debug.Print objIE.document.getElementsByTagName("p")(4).textContent

   displays text content of 5th p element on a page.

For Each ele In objIE.document.getElementsByTagName("input")
If ele.Value = "Sign In" then ele.Click: Exit For
Next

   clicks an input element that has a value equal to ‘Sign In’

For Each ele In objIE.document.getElementsByTagName("a")
If InStr(ele.href, "wp-admin") > 0 then objIE.navigate ele.href: Exit For
Next

   gets the link element containing ‘wp-admin’ and navigates to it.

Let’s practice. Let’s say every morning we need to create a report which requires going to a website, copying a bunch of data, and pasting it into Excel, and we’re wanting to automate the task…

I’ll just use a list of last names as example data. Doing this manually would look like…

Example data - a huge list of last names
Example data – a huge list of last names

Paste into Excel…

Pasting data into Excel - tedious!
Pasting data into Excel – tedious!

Not too hard, but many of our ‘Excel + web data’ tasks are more involved. And if they are daily, or otherwise repititive, you could save hours per year by automating them. The Sub below performs our task within seconds … and the code is eternally reusable. Assuming this is a 5-minute task, daily, building an automation super macro would save 30 hours a year!

Paste the code below into a VBA code module (how to open a new VBA module), or download below.

Sub GrabLastNames()
 
    'dimension (set aside memory for) our variables
    Dim objIE As InternetExplorer
    Dim ele As Object
    Dim y As Integer
 
    'start a new browser instance
    Set objIE = New InternetExplorer
    'make browser visible
    objIE.Visible = True
 
    'navigate to page with needed data
    objIE.navigate "http://names.mongabay.com/most_common_surnames.htm"
    'wait for page to load
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
 
    'we will output data to excel, starting on row 1
    y = 1
 
    'look at all the 'tr' elements in the 'table' with id 'myTable',
    'and evaluate each, one at a time, using 'ele' variable
    For Each ele In objIE.document.getElementById("myTable").getElementsByTagName("tr")
        'show the text content of 'tr' element being looked at
        Debug.Print ele.textContent
        'each 'tr' (table row) element contains 4 children ('td') elements
        'put text of 1st 'td' in col A
        Sheets("Sheet1").Range("A" & y).Value = ele.Children(0).textContent
        'put text of 2nd 'td' in col B
        Sheets("Sheet1").Range("B" & y).Value = ele.Children(1).textContent
        'put text of 3rd 'td' in col C
        Sheets("Sheet1").Range("C" & y).Value = ele.Children(2).textContent
        'put text of 4th 'td' in col D
        Sheets("Sheet1").Range("D" & y).Value = ele.Children(3).textContent
        'increment row counter by 1
        y = y + 1
    'repeat until last ele has been evaluated
    Next
 
    'save the Excel workbook
    ActiveWorkbook.Save
 
End Sub

By using getElementsByTagName (especially when getElementByID and getElementsByClassName are not an option) you can scrape data from any web page at the push of a button.

What tasks can you automate?

To download this Excel VBA project: getelementsbytagname.xlsm

Recommended next: reddit Q&A – How to extract data from HTML into Excel

Facebooktwittergoogle_plusredditlinkedintumblrFacebooktwittergoogle_plusredditlinkedintumblr