Understanding the Server Load Metric of Your Server

Monitoring server load is an important task when managing a server. Making sure the load isn’t too high will keep the hardware safer and prevent outages and downtime. This article explains the finer points of how to find your server load, understanding it, and offers solutions for managing it.

Once you’ve just gotten access to your new server, you can check the server load by running top. Top is an application that provides basic hardware information such as uptime, CPU and memory usage, and server load. Running the command will give you a screen that looks something like this:

Running top will show several metrics including Tasks, Server Load, and CPU/RAM usage.

In the upper right-hand corner, you will see “load average” with 3 numbers next to it. These numbers represent different lengths of time: one, five, and fifteen minutes averages respectively. The numbers themselves represent the number of processes that are currently running plus the number that are waiting to run. Since the processor can only run a certain number of processes at a time, the metric is designed to represent the full potential of work the server is doing by including queued processes. Therefore, more processes, running and queued, means a higher server load.

To get a better idea of this let’s say we have one single core processor in a server. Think of this as a single runway at an airport. The air controller has to coordinate planes landing and departing. If there are just enough planes to keep the runway utilized at 100% without any planes waiting, we can assign a run-queue length at 1.0. If however, there are more planes than the runway can handle at one time, a line is formed and the number will go up. For example, if there are twice as many planes then the run-queue length will go up to 2.0. Vice-versa if there are half as many, it will go down to 0.50.

This doesn’t mean we want to always be at 1.0 however. Spikes are ok, but if you’re consistently at or above 1 then you should look at the server for possible issues. This is where the set of 3 comes in. You can use the 15-minute average to gauge if there is a consistent load and a potential issue on the server. The ideal load should be around 0.70 long term. This metric is something that should be consistently monitored to allow proactive action and prevent downtime.

What about multi-core processors? Well, the rule of thumb is 1.0 per core. So if you have a 4 core processor, you are looking for a load of up to 4.0 (or rather 2.80 which is 70% of 4.0). You can determine how many cores you have by running nproc on your server. It will display an output of exactly how many cores you have. This also applies to multi-processor servers too. The total number of cores, regardless of configuration, will allow you to understand the server load metric accurately. Running nproc will give you a total list of all cores on your machine and a great baseline to monitor your server load.

Running nproc will give you the total amount cores on your system.

To recap, the basic rule of thumb is 1.0 per core. If you’re seeing this in the 1-minute average, you might be ok. If you’re seeing that, or a higher number, consistently in the 5 or 15-minute average, it’s time to investigate before things get worse.

We know that constantly checking these metrics can be a pain. At Hivelocity we offer a few Managed Services packages that will do all the work for you. We install tools that actively monitor this and many other metrics of your server and will alert us if there are any issues. This will allow you the peace of mind of making sure your hardware is carefully watched, without having to constantly monitor it yourself. As a result, there is less risk of downtime and you save time and energy in the process. To learn more about adding Managed Services to one of your servers, or purchasing a Managed Server from Hivelocity, open a chat session and our account managers will be happy to assist.