Debugging random segfaults

When I moved to London in June, I didn't bring my desktop with me. I only brought my laptop instead. Both systems run Arch Linux. I moved to Zurich last month and got my desktop back last week. As it turn out, updating an Arch Linux system after three months is not without danger.

I had already backed up my data from the desktop before I left because I had a feeling that something like this would happen. When I got my desktop back, the first thing I did was let pacman update the system.

Everything seemed fine until segfaults started to show up. They showed up in my R scripts, in vlc, in evince, in xdot,...

Segmentation fault (core dumped)

I didn't find anything in the archlinux.org news but apparently segfaults on random programs are known as the early symptoms of failing RAM. I figured failing RAM would not segfault on the same programs every time but I had no other leads.

After failing to run memtest, I remembered an amazing talk by Julia Evans in which she mentions how great strace is for debugging.

I ran strace on some of the programs that were giving me segfaults and they all seemed to be looking for a file named libnvidia-egl-wayland.so.

open("/usr/lib/tls/x86_64/libnvidia-egl-wayland.so.", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/tls/x86_64", 0x7ffe039bc1c0) = -1 ENOENT (No such file or directory)
open("/usr/lib/tls/libnvidia-egl-wayland.so.", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/tls", 0x7ffe039bc1c0)    = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64/libnvidia-egl-wayland.so.", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64", 0x7ffe039bc1c0) = -1 ENOENT (No such file or directory)
open("/usr/lib/libnvidia-egl-wayland.so.", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

After some googling, I found out that this may be caused by the fact that Intel microcode is no longer loaded automatically. I'm still not sure why. Apparently the only thing I had to do was rebuild my grub config to make sure that the Intel microcode would be loaded during boot.

$ sudo grub-mkconfig -o /boot/grub/grub.cfg
Generating grub configuration file ...
Found Intel Microcode image
[...]

Now I could finally clean up all the core dumps.

$ du -sh /var/lib/systemd/coredump 
2.5G  /var/lib/systemd/coredump

If you liked this blog post, please consider becoming a supporter:

Become A Supporter