Linux performance monitoring is less about watching a dashboard and more about understanding how your system’s components are negotiating for finite resources.
Let’s peek under the hood. Imagine a web server handling 100 requests per second.
# This command will show you real-time CPU usage per core
# and overall load average. The '1' indicates a 1-second refresh.
top -d 1
top - 10:30:00 up 10 days, 2:15, 1 user, load average: 0.50, 0.60, 0.70
Tasks: 200 total, 1 running, 199 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.0 us, 2.0 sy, 0.0 ni, 93.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 16000.0 total, 12000.0 free, 3000.0 used, 1000.0 buff/cache
MiB Swap: 2000.0 total, 2000.0 free, 0.0 used. 12500.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1234 www-data 20 0 500000 20000 5000 R 10.0 0.1 0:15.20 apache2
5678 root 20 0 30000 2000 1000 S 2.0 0.0 0:05.10 sshd
Here, top is showing us the apache2 process is consuming 10% CPU. The load average tells us that, on average over the last minute, there were 0.5 processes waiting for CPU time. The us (user space) and sy (system/kernel space) percentages in %Cpu(s) indicate where the CPU time is being spent.
The apache2 process is a web server, and the www-data user is how it typically runs. The R state means it’s currently running or runnable. 500000 is the virtual memory size, 20000 is resident memory (RAM), and 5000 is shared memory.
Memory is often the bottleneck. When the system runs out of physical RAM, it starts using swap space (disk), which is orders of magnitude slower. The top output shows 12000.0 MiB free and 3000.0 used of physical memory, with 12500.0 avail Mem. This indicates plenty of free memory.
Disk I/O is the next suspect. A process might be CPU-bound, but if it’s constantly reading or writing to disk, that disk activity can slow down other processes, even if they aren’t directly touching the disk.
# This command shows disk I/O statistics per device.
# The '1' again for 1-second refresh.
iostat -dx 1
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 10.00 0.00 50.00 0.00 100.00 4.00 0.50 10.00 0.00 10.00 5.00 25.00
In this iostat output, sda is our primary disk. r/s and w/s are reads and writes per second. rkB/s and wkB/s are the kilobytes read/written per second. avgqu-sz is the average queue length for I/O requests, and await is the average time in milliseconds for I/O requests to be served. A high await or avgqu-sz suggests the disk is struggling to keep up. Here, 50.00 writes per second with 100.00 KB/s and an average wait of 10.00 ms is moderately busy but not alarming.
Network performance is crucial for anything connected. High latency or low throughput can cripple applications.
# This command shows network interface statistics.
# The '1' for 1-second refresh.
sar -n DEV 1
Linux 5.15.0-76-generic (my-server) 08/15/2023 _x86_64_ (4 CPU)
10:35:00 IFACE rx/s tx/s rxkB/s txkB/s rxcmp txcmp rxmcst %ifutil
10:35:01 eth0: 100.00 200.00 10.00 20.00 0.00 0.00 0.00 0.00
10:35:01 lo: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sar -n DEV shows network traffic. rx/s and tx/s are packets received and transmitted per second. rxkB/s and txkB/s are kilobytes. eth0 is our main network interface. 100 packets received and 200 packets transmitted per second, with 10 KB/s and 20 KB/s, is very light traffic.
The kernel itself plays a massive role. It’s the intermediary for all these operations. Understanding its state can reveal subtle issues.
# This command shows kernel-level statistics.
# Specifically, it shows memory management, CPU scheduling, and I/O statistics.
# The '1' is for 1-second refresh.
vmstat 1
vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 12000000 100000 12000000 0 0 10 50 100 200 5 2 93 0 0
In vmstat, r is the number of runnable processes, b is processes in uninterruptible sleep (often waiting for I/O). swpd is swap used, free is free memory, buff is buffer cache, cache is page cache. si and so are swap in/out, bi and bo are blocks in/out from block devices. in is interrupts per second, cs is context switches per second. High in or cs can indicate a busy system or inefficient scheduling.
The load average displayed by top is a composite of runnable and uninterruptible sleep processes over 1, 5, and 15 minutes. A load average consistently higher than the number of CPU cores indicates the system is overloaded.
When dealing with network performance, understanding the TCP/IP stack’s behavior is key.
# This command shows TCP connection states and statistics.
netstat -s
Ip:
12345 total packets received
0 forward packets
0 incoming packets discarded
12300 incoming packets delivered
...
Tcp:
12345 active connections openings
50 passive connection openings
100 failed connection attempts
100 connection retransmissions
...
Udp:
...
Icmp:
...
The netstat -s output provides a wealth of information. For TCP, look at failed connection attempts and connection retransmissions. High numbers here point to network issues (packet loss, congestion) or a misbehaving application.
The kernel’s scheduler is what decides which process gets to run on a CPU. Its efficiency directly impacts overall system responsiveness.
# This command shows scheduler statistics.
# It's not a direct real-time view like top, but gives historical context.
# Look for high context switch counts in vmstat, and investigate the cause.
# For deeper dives, you might use perf or eBPF tools.
The most surprising thing about CPU utilization is that 100% CPU usage by a single process isn’t always a problem; it’s expected for CPU-bound tasks. The real issue arises when multiple processes are vying for CPU and the system can’t keep up, leading to high load averages and increased latency.
When you encounter high load averages, the perf top command can be invaluable. It samples the instruction pointer of running processes and shows you which functions are consuming the most CPU time in real-time. This allows you to pinpoint specific code paths that are causing the bottleneck, rather than just seeing a high percentage for a whole process.
The next problem you’ll likely encounter is understanding how to correlate these metrics across different time scales to identify transient issues.