The Need for Uptime Monitoring

One part of keeping your WordPress website healthy is using an uptime monitor. I use a free one called Jetpack Monitor. Jetpack Monitor is very basic. It checks your website every 5 minutes and sends an email notice if something is wrong.

Lots of things can go wrong

There are many reasons for a website to go offline. WordPress is just one small part of the stack. Anything relating to the domain, DNS, web server, software, network, power or data center can all effect the uptime of a website.

504-error.png

Monitoring is useful if you have a plan

Every website should have an uptime monitor however not everyone needs to watch it. I’d recommend deciding ahead of time who is responsible for watching the monitor with a plan to take action. Whenever I receive a down notification from Jetpack, this is what I do.

  1. Check to see if the website is actually down. Jetpack might have received a one time error however that doesn’t mean your website is still down. It could have been a brief connection issue or an abnormal server load.
  2. If down, then assess how widespread the outage. Is it affecting an individual site, group of sites, entire server or entire hosting provider. Finding patterns can sometimes be tricky however there is generally a pattern to be found.
    1. Individual outages are generally domain, dns or misconfiguration like a PHP syntax error or home page 404. A web developer can usually identify fairly quickly where the source of the problem is and apply a workaround fix.
    2. Group outages are typically DNS or relating to a larger cloud provider outage. For example when Amazon S3 is having problem it can mishaps due to services which rely on Amazon S3.
    3. Entire server outages are the easiest to identify. Could be a load, software or hardware related. If your server is in a big data center like Google then hardware problems are typically identified and resolved within 10 to 15 minutes.
  3. Communicate what is known. If an outage is affecting a significant portion of clients more then 10 minutes then post what is known on my status page.
  4. Seek out more info. This generally involves reaching out to hosting provider and reviewing various status pages.
  5. Followup after things are stable. This can be tricky when things are going up and down. Generally the best advice is to give it time. I feel a lot better if I see things running stable 4 hours vs 30 minutes.