reddit Q&A – How to extract data from HTML into Excel

reddit - How to extract HTML
reddit – How to extract HTML

Here’s a question from redditor Sajano90:

… need to know how i can create a list in excel with every product by its name and price (from)
maybe for the start its not needed to be every product but maybe all the cases or all the lcd monitors or something like this.
i think its easiest to take the url from the special categorie of the produkt, and than try to take the first page with its 20 products and find the elements with the name and the price and list them along to another in an excel table.

The answer I gave was a small adaptation of SearchBot(), the beginner web scraping tutorial on the home page.

Under Tools > References… add these reference libraries: Microsoft HTML Object Library and Microsoft Internet Controls.

Sub GetData()
    Dim objIE As InternetExplorer
    Dim itemEle As Object
    Dim desc As String, pt1 As String, pt2 As String, price As String
    Dim y As Integer
    Set objIE = New InternetExplorer
    objIE.Visible = True
    objIE.navigate ""
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
    y = 1
    For Each itemEle In objIE.document.getElementsByClassName("item_grid_2_cols")
        desc = itemEle.getElementsByTagName("h3")(0).innerText
        pt1 = itemEle.getElementsByClassName("prod_preis")(0).getElementsByTagName("a")(0).textContent
        pt2 = itemEle.getElementsByClassName("prod_preis")(0).getElementsByTagName("a")(1).textContent
        price = pt1 & pt2
        Sheets("Sheet1").Range("A" & y).Value = desc
        Sheets("Sheet1").Range("B" & y).Value = price
        y = y + 1
End Sub


reddit Q&A - How to extract HTML, Excel result
reddit Q&A – How to extract HTML, Excel result

Finding the solution was fun because the layout of the target website was very convoluted, requiring the price ‘dollars’ and ‘cents’ to be initially captured in two different variables.

To download this Excel VBA project: reddit – HTML scrape mindfactory.xlsm

And here’s a similar reddit Q&A: Grabbing an HTML table and inserting the cells into Excel?