Chrome is a nice and welcomed addition to rendering web pages headless. Phantomjs has been the only solution until headless chrome came. I tested how using headless chrome with golang would work.
I was interested to see how long did it take to load the first html, and how long after that did it take to render the page with all of it’s javascript and styles and images. I ended up making a chart where I could see these straight away, and compare them.
(blue dots: index.html fetched every minute, orange dots: page rendered)
First of all, I used a already made library, https://github.com/raff/godet/. This helped a lot.
Starting a headless chrome
Before doing anything with chrome, we need to start it, you can do it manually or in the code. Following will choose the chrome headless command by operating system.
switch runtime.GOOS {
case "darwin":
chromeapp = `open "/Applications/Google Chrome Canary.app" --args`
case "linux":
chromeapp = "chromium-browser"
}
if chromeapp != "" {
chromeapp = " --headless --remote-debugging-port=9222 --hide-scrollbars"
" --disable-extensions --disable-gpu about:blank"
}
}
Then we need to run that command to start chrome.
errRun := runCommand(chromeapp)
if errRun != nil {
log.Println("cannot start browser", errRun)
}
var err error
for i := range [1..20] {
if i > 0 {
time.Sleep(500 * time.Millisecond)
}
remote, err = godet.Connect(port, false)
if err == nil { // connection succeeded
break
}
log.Println("connect", err)
}
if err != nil {
log.Println("cannot connect to browser")
}
The runCommand function:
func runCommand(commandString string) error {
parts := args.GetArgs(commandString)
cmd := exec.Command(parts[0], parts[1:]...)
return cmd.Start()
}
Listening events
After we have chrome running headless, we can start to listen.
with remote.AllEvents(true)
we listen to all events (DOMEvents, PageEvents, NetworkEvents and so on).
With callback functions
remote.CallbackEvent("Network.requestWillBeSent", func(params godet.Params) {
})
and
remote.CallbackEvent("Network.responseReceived", func(params godet.Params) {
})
we can keep track of requests sent and responses received. And then do what we want once the callback is called. In this case, we’re only waiting for them all to be received. There is also a callback for page onload.
remote.CallbackEvent("Page.loadEventFired", func(params godet.Params) {
})
Navigating to a url and taking a screenshot
Navigate to a url remote.Navigate(d.Url)
. Now, at this point we need to wait for the page to be loaded (Page.loadEventFired and responses) before taking the screenshot. Otherwise it would be blank. I used sync.WaitGroup
for waiting for the onload and all responses to be ready.
And finally take a screenshot. remote.SaveScreenshot(“assets/img/screenshot.png”, 0644, 0, true)
You can view events at Page events and Network events