The most surprising thing about Linux CPU and memory monitoring is how much of what top and htop show you is a lie, or at least a deeply misleading simplification.

Let’s see vmstat in action. Run it with a short interval, say every second, and loop it a few times:

vmstat 1 5

You’ll see output like this:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 123456  10000 567890    0    0     5    10  150  300 10  5 80  0  5
 0  0      0 123400  10000 567900    0    0     0     0  120  250  5  2 93  0  0
 2  0      0 123000  10000 568000    0    0    10    20  180  350 15 10 70  0  5
 0  0      0 123000  10000 568000    0    0     0     0  110  230  2  1 97  0  0
 1  0      0 122900  10000 568100    0    0     8    15  160  320 12  7 75  0  6

What does this tell us?

  • r: The number of processes waiting for run time (in the run queue). High numbers here mean your CPU is overloaded.
  • b: The number of processes in uninterruptible sleep (usually waiting for I/O). High b means I/O is bottlenecking your system.
  • swpd: Amount of virtual memory used. If this is non-zero and growing, you’re swapping, which is bad.
  • free: Amount of idle memory. This is the least useful number on Linux.
  • buff: Memory used as buffers (for block device I/O).
  • cache: Memory used as page cache (for file I/O). This is reclaimable if needed.
  • si, so: Swap-in and swap-out. If these are non-zero, you’re actively swapping to/from disk. Painful.
  • bi, bo: Blocks received from and sent to block devices (disk I/O).
  • in: Number of interrupts per second.
  • cs: Number of context switches per second. High cs can indicate too many processes or threads fighting for CPU.
  • us, sy, id, wa, st: CPU time spent in user space, system/kernel space, idle, waiting for I/O, and stolen (by hypervisor).

Now, let’s look at top and htop. They give you a per-process view.

top

or

htop

top shows a snapshot of what’s currently happening. The %CPU and %MEM columns are what most people focus on. But here’s the trick: those percentages are averaged over the lifetime of the process by default in top (and can be configured differently). This means a process that was a CPU hog for an hour but is now idle will still show a high historical average. You need to look at the TIME+ column to see how much CPU time a process has actually consumed since it started, and compare that to the total uptime of the system to get a sense of its current load.

htop is generally more user-friendly. It provides a more dynamic view and color-coding. You can easily sort by CPU or memory usage. Notice the CPU bars at the top in htop – these give you a real-time, per-core view of CPU utilization, which is much more immediate than top’s overall CPU percentage.

The real power comes from combining these tools. vmstat gives you the system-wide picture: are we swapping? Is I/O a bottleneck? Is the CPU overloaded overall? top and htop then let you drill down to which process is causing that system-wide behavior. If vmstat shows high wa (I/O wait), top or htop will help you find the process doing all that disk I/O. If vmstat shows high r (run queue), you can use top/htop to see which processes are consuming the CPU.

The distinction between buff and cache in vmstat is crucial. Linux aggressively uses free memory for caching files. This cache is not lost memory; it’s readily available to be reclaimed if a process needs it. So, a low free value is normal and desirable. You only start worrying when swpd is non-zero and si/so are active, indicating actual memory pressure requiring disk swap.

The st (stolen time) column in vmstat is often overlooked. This appears in virtualized environments and indicates time that the virtual CPU was ready to run but wasn’t, because the hypervisor was busy with another virtual machine or other host tasks. High st means your VM is not getting its fair share of CPU resources from the host.

When you’re debugging performance issues, start with vmstat to understand the system’s overall state. Is it CPU-bound, I/O-bound, or memory-bound? Then, use top or htop to identify the specific processes contributing to that bottleneck. Don’t get fixated on the free memory number; focus on swap activity and I/O wait.

The next thing you’ll likely wrestle with is understanding strace and lsof to pinpoint why a process is causing high I/O or CPU usage.

Want structured learning?

Take the full Linux & Systems Programming course →