Why Should I Trust Alertra Uptime Data?

One reason companies use Alertra to monitor their servers is so they can publish their uptime. Uptime percentage is used as a selling point and is a factor in many Service Level Agreements (SLAs). But why should you trust that the Alertra uptime data published by your hosting provider or co-location facility is accurate?

Uptime data security.

"My hosting provider uses your service and publishes your reports, but what keeps them from altering the data and 'erasing' inconvenient outages?" As a matter of Alertra company policy, outages will not be deleted from a device unless it was an internal problem with the Alertra network monitor software that caused an outage to be erroneously reported.

Our stats are used by hundreds of web hosting providers to prove their uptime to their customers and the only reason for the customers to trust those stats is that we are an unbiased third party. If we deleted outages that trust would evaporate and those web hosting providers would have no way to demonstrate their uptime.

While your hosting provider or co-location facility owns the account with us and configures the devices, the outage data cannot be deleted by the account holder. This is why it is important to be precise about what you are monitoring when setting up the device in Alertra. For more information on strategies for Internet server monitoring, see this article.

The only recourse for getting rid of unwanted outages is to delete and re-add the device. That gets rid of the outage, but you also lose the history for the device.

Alertra accuracy.

For most device types, including all of our various HTTP monitors, Alertra employs a full RFC1 based protocol check of the server. When our server monitoring software performs a protocol based check of a device, it connects to the server and port, interacts with the service and will consider the device down if it does not generate the proper responses. For HTTP devices our interaction with the HTTP server is very similar to how your web browser interacts with that same server.

To insure that we do not falsely mark a site down when it is really up, we use a network of remote monitoring stations set up all over the world. These stations are all on different networks with varying connection paths to Internet backbones. In addition, Alertra will not judge a device to be down unless 3 of our monitoring stations agree that it is not functioning. This keeps a local outage at or near one of our stations from causing a perfectly working webserver from being considered down.

As you can see, Alertra's network monitoring software was designed to insure that real outages are detected while false positives are virtually eliminated. However, how well our software detects an outage also depends on what it is looking for and how often.

Here is a situation that comes up frequently: Alertra is configured to check a server using PING when the web server quits working. If you want to access the web service running on that server, then to you the device is obviously down. However, as long as the server responds to our PING packets (which gauges whether or not the server has connectivity, not whether a particular service is running on the server), then Alertra will record the server as up during that time. The stats are accurate, but it may be that the service you are trying to use on the server is not the one being monitored.

Here is a pop quiz: If Alertra is monitoring a web server every 20 minutes (3 times per hour), what is the maximum amount of time the server can be down without Alertra marking it down? Answer: 19 minutes 59 seconds. If the server goes down right after one of our checks, Alertra will not mark it down until the next check in 20 minutes2.

In judging the accuracy of Alertra's uptime data you need to consider whether the device has been configured to monitor the services you are interested in and whether or not the interval is sufficient to build an accurate picture of the stability of those services given the length of time the monitoring has been in effect. In general, longer check intervals require a longer history of monitoring to give an accurate picture.

Summary

At Alertra we have designed, built, and continue to maintain and expand network monitoring services that provide the most reliable monitoring in the world. We do that by performing full RFC protocol checks of servers from a worldwide network of monitoring stations. If a device monitored by Alertra goes down, we will detect it. If the device is functioning properly, our redundant checks insure it will not falsely be recorded as down. That uptime data cannot be altered to "hide" outages.

These factors mean that when your hosting provider or co-location facility publishes their Alertra uptime statistics, you can have confidence that the uptime percentages reflect the real-world availability of that server.


1 From Wikipedia: "A Request for Comments (RFC) document is one of a series of numbered Internet informational documents and standards very widely followed by both commercial software and freeware in the Internet and Unix communities....The basic communication protocols which the Internet uses to operate are all specified in RFCs, for instance. However, RFCs cover many topics in addition to standards, such as introductions to new research ideas and status memos about the Internet. While few RFCs are standards, almost all Internet standards are recorded in RFCs."

2 The length of the monitoring interval does not unfairly inflate or deflate a device's uptime percentage. Although we recommend low monitoring intervals for faster detection of outages, higher intervals, given a sufficient amount of historical data, should give an accurate picture of a device's stability.