07/03/2012

Website Performance Monitoring: Going Beyond the 6 Nines

Let's say you've done all the "due diligence" you can possibly stand in making your website as reliable as possible. You have a reliable Internet connection (or two), your server has plenty of resources (disk space, RAM and CPU), your database is replicated, you have back up power and your application has zero bugs! (There's not even so much as a single error message in your web server logs!)

What's Next?

Well, if your site isn't fast, having "6 nines" uptime won't really matter. Internet users are getting more and more demanding. If they search for you and click, they won't wait around more than a few seconds for your page to load. Then, it's a simple click of the back button and you are dead to them - forever. At least that's how we browse the 'net.

You need a monitoring tool that will give you performance information for your HTTP request/response transactions. Just knowing when your site is down isn't enough and just knowing how long the response took isn't enough. You need detail. We break it down like this:

  • DNS Time - The time it takes to get your authoritative name server to resolve your domain name and return the IP address for the requested host. And we do a real lookup every time (no cached lookups).
  • Connect Time - The time it takes to send your web server a connection request (TCP SYN packet) and receive a response (TCP ACK packet).
  • SSL Time - For secure pages this is the time required for the SSL bits to negotiate themselves.
  • TTFB (or time to first byte) - This is the time required to send your server the request and wait for the first byte of response. It is typically a good indicator of how long it takes your server to build the response page.
  • TTLB (or time to last byte) - This is the time required to receive the body (content) of the HTTP response. We also use this number to estimate the data transfer rate so you can see how your bandwidth is working.

The beauty of having all this information is it gives you a fighting chance to really improve the speed of your responses. For example, without this data, you might spend a lot of time and money making your web server a lean, mean fighting machine, when really it's your DNS lookups that are slowing the whole thing down - and they could be coming from a different server.

Another example: If you just have one overall response time number to look at, and it's a big number, you might think you need more bandwidth. With detailed data, it will become obvious quickly if your content generation (e.g. a slow database query) is the real culprit. In that case, TTFB will be much longer than any of the other values and will get your attention like a shining beacon of light.

We want to do more than just wake you up in the middle of the night because something crashed. We want to give you the information you need to put your best foot forward out there on the public Interwebs.