Thoughts of a software developer

26.09.2017 19:47 | Modified 11.08. 18:28
Using headless Chrome with Golang

Chrome is a nice and welcomed addition to rendering web pages headless. Phantomjs has been the only solution until headless chrome came. I tested how using headless chrome with golang would work.

I was interested to see how long did it take to load the first html, and how long after that did it take to render the page with all of it’s javascript and styles and images. I ended up making a chart where I could see these straight away, and compare them.

(blue dots: index.html fetched every minute, orange dots: page rendered)

First of all, I used a already made library, https://github.com/raff/godet/. This helped a lot.

Starting a headless chrome

Before doing anything with chrome, we need to start it, you can do it manually or in the code. Following will choose the chrome headless command by operating system.

switch runtime.GOOS {
    case "darwin":
        chromeapp = `open "/Applications/Google Chrome Canary.app" --args`
    case "linux":
        chromeapp = "chromium-browser"
    }
    if chromeapp != "" {
        chromeapp  = " --headless --remote-debugging-port=9222 --hide-scrollbars"  
        " --disable-extensions --disable-gpu about:blank"
    }
}

Then we need to run that command to start chrome.

errRun := runCommand(chromeapp)
if errRun != nil {
    log.Println("cannot start browser", errRun)
}
var err error
for i := range [1..20] {
    if i > 0 {
        time.Sleep(500 * time.Millisecond)
    }
    remote, err = godet.Connect(port, false)
    if err == nil { // connection succeeded
        break
    }
    log.Println("connect", err)
}
if err != nil {
    log.Println("cannot connect to browser")
}

The runCommand function:

func runCommand(commandString string) error {
    parts := args.GetArgs(commandString)
    cmd := exec.Command(parts[0], parts[1:]...)
    return cmd.Start()
}

Listening events

After we have chrome running headless, we can start to listen.

with remote.AllEvents(true) we listen to all events (DOMEvents, PageEvents, NetworkEvents and so on).

With callback functions

remote.CallbackEvent("Network.requestWillBeSent", func(params godet.Params) {
})

and

remote.CallbackEvent("Network.responseReceived", func(params godet.Params) {
})

we can keep track of requests sent and responses received. And then do what we want once the callback is called. In this case, we’re only waiting for them all to be received. There is also a callback for page onload.

remote.CallbackEvent("Page.loadEventFired", func(params godet.Params) {
})

Navigating to a url and taking a screenshot

Navigate to a url remote.Navigate(d.Url). Now, at this point we need to wait for the page to be loaded (Page.loadEventFired and responses) before taking the screenshot. Otherwise it would be blank. I used sync.WaitGroup for waiting for the onload and all responses to be ready.

And finally take a screenshot. remote.SaveScreenshot(“assets/img/screenshot.png”, 0644, 0, true)

You can view events at Page events and Network events

urlmonitor in action, monitor.jelinden.fi

Github urlmonitor