Core Dumps Explained: From Crash to Root Cause

A Linux core dump is a snapshot of a process’s memory and state at the exact moment it crashed, giving you a perfect picture to debug with gdb.

Let’s see a process crash and then inspect its core dump.

First, we need to enable core dumps. By default, they are disabled.

ulimit -c unlimited

This command tells the shell to allow unlimited core dump file sizes for the current session. Without this, even if a program crashes, you might not get a core file.

Now, let’s create a simple C program that will intentionally crash:

#include <stdio.h>
#include <stdlib.h>

void crash_function() {
    int *ptr = NULL;
    *ptr = 10; // Dereferencing a NULL pointer, guaranteed crash
}

int main() {
    printf("About to crash...\n");
    crash_function();
    printf("This will never be printed.\n");
    return 0;
}

Save this as crash.c and compile it:

gcc crash.c -o crash

Now, run the program. It will crash, and if ulimit -c unlimited was active, you’ll find a core file in the same directory.

./crash

You’ll see output like:

About to crash...
Segmentation fault (core dumped)

And a file named core will appear in your directory.

To debug this, we use gdb.

gdb ./crash core

This command loads both the executable and the core dump file into gdb.

Inside gdb, you’ll be greeted with a prompt. The most important command is bt (backtrace), which shows you the call stack at the time of the crash.

(gdb) bt

This will show you exactly which function called which function, leading to the crash. You’ll likely see something like:

#0  0x0000555555555174 in crash_function () at crash.c:5
#1  0x000055555555520f in main () at crash.c:10

This tells you the crash happened in crash_function at line 5 of crash.c, and crash_function was called by main at line 10.

You can then examine variables in the context of the stack frames. For example, to see the variables in crash_function:

(gdb) frame 0

Then, print the pointer that caused the crash:

(gdb) p ptr
$1 = (int *) 0x0

This confirms ptr was NULL, and dereferencing it caused the segmentation fault.

The real power comes from understanding the state of the program. You can examine memory, registers, and even see the source code if available.

The default location for core dumps can be controlled by the kernel. If you don’t see a core file, it might be because the system is configured to dump them elsewhere or with a specific naming convention. You can check /proc/sys/kernel/core_pattern. If it’s set to something like |/usr/share/apport/apport --core %s %c %p %u %e, core dumps are piped to a specific handler, not created as files directly. To get plain files, you might need to set it to core or core.%e (for executable name) using sysctl kernel.core_pattern=core.

A common pitfall is that the core dump file might be truncated if the filesystem runs out of space or if ulimit -c was set to a small, non-unlimited value. Always check the size of the generated core file.

Another reason for missing core dumps is security restrictions. If the process is running with elevated privileges (e.g., as root) but the core dump is being written to a directory where the user running gdb doesn’t have write permissions, you won’t see the file. Ensure the directory where the executable is run is writable by the user who will be debugging.

If you’re debugging a multi-threaded application, bt will show you the stack for the thread that crashed. To see all threads, use info threads. You can then switch between threads using thread <thread_id>.

Sometimes, the executable might have been compiled with optimizations (-O2, -O3) that can make debugging confusing. For critical debugging, recompile with -O0 -g to disable optimizations and ensure maximum debugging information.

Finally, if the core dump file is very large, loading it into gdb can take a long time. Be patient, especially on systems with slower storage.

The next hurdle you’ll likely face is dealing with complex data structures or object states within the core dump, requiring more advanced gdb commands like print with type casting or examining memory addresses directly.