Full Page Website Screenshots

I previously wrote about taking website screenshots with gowitness. That works great for basic website screenshots. However my most recent project required full page screenshots which, unfortunately, gowitness does not yet support. It’s also unlikely to be added anytime soon. Under the hood, gowitness simply uses headless Chrome, which itself doesn’t support full page screenshots. You can however take screenshots directly with Chrome from the command line as shown here.

chrome --headless --disable-gpu --hide-scrollbars --disable-crash-reporter --window-size="1440,900" --screenshot=screenshot.png --virtual-time-budget=2000

My first thought was, maybe I could get Chrome to handle the full page screenshots directly using some cleaver pixel emulation. However, after a few hours of Googling and testing I’ve concluded it’s just not possible.

Full page screenshots are an artificial recreation. 📸

Imagining how a website should look as a single image is really just a stitched together image of something that never truly existed in the first place. Websites are dynamic. As you scroll you are taking a peak at a living thing. Attempting to put that motion into a single picture gets messy. How are statically positioned elements handled? How are headers that tuck themselves up into a hamburger menus as you scroll handled? How about other pullouts or animations which happen as you scroll? All of this considered, you expect a few arguments to headless Chrome to understand what you want? No, that’s just not going to happen.

Solution, use a paid screenshot service. 💰

Creating a full page screenshot requires some opinions to be taken. Rather then spend days or weeks in code I decided to implement a paid screenshot service for my project. Paid services also come packed with a bunch of other nice features like retina-quality images and waiting for specific DOM elements.

I tested out two different services: Urlbox and ScreenshotsCloud. URLbox had some quirky issues with retina images so end up using ScreenshotsCloud. My implementation is written in bash which unfortunately doesn’t have a built-in method for encoding URLs. Other than that complicated piece, it’s pretty much just using curl to fetch an image from a properly configured API url.

urlencode() {
    local _length="${#1}"
    for (( _offset = 0 ; _offset < _length ; _offset++ )); do
        _print_offset="${1:_offset:1}"
        case "${_print_offset}" in
            [a-zA-Z0-9.~_-]) printf "${_print_offset}" ;;
            ' ') printf + ;;
            *) printf '%%%X' "'${_print_offset}" ;;
        esac
    done
}

url="https://anchor.host"
user_agent="captaincore/1.0 (CaptainCore Capture by CaptainCore.io)"
screenshots_cloud_api_key="############"
screenshots_cloud_api_secret="############"
query="url=$( urlencode "${url}" )&full_page=true&viewport_width=1280&user_agent=$( urlencode "$user_agent" )&format=jpg&pixel_ratio=2"
screenshots_cloud_token=$( printf '%s' "$query" | openssl sha1 -hmac "$screenshots_cloud_api_secret" | sed 's/^.* //' )

curl "https://api.screenshots.cloud/v1/screenshot/$screenshots_cloud_api_key/$screenshots_cloud_token/?$query" > screenshot.jpg"

In the above example, pixel_ratio=2 asks for a retina image and full_page=true makes it full page. I’m also passing in a unique user_agent which can help keep website traffic generated from my screenshots isolated to just my utility. The results are shown in this beautiful looking full page screenshot. 😁