HelioHost Posted September 27, 2020 Posted September 27, 2020 We've just discovered that some of our VPS may be affected by a memory leak. Ubuntu 20 for sure has this issue, and other OS choices may as well. Here are some of the symptoms you can check to see if this issue is affecting your VPS: /var/log/journal/ is huge:root@krydos5:/var/log/journal# du -sh /var/log/journal 2.5G /var/log/journal The journald process is using tons of memory. This command shows you the percent of memory that journald is using. In this example journald is using 23.6% of the total system memory: root@krydos5:/var/log/journal# ps -o %mem,command ax|grep -v grep|grep journald 23.6 /lib/systemd/systemd-journald The journal log is full of lines about sda every few seconds: root@krydos5:/home/krydos# journalctl -xe Sep 27 15:42:24 krydos5.heliohost.org multipathd[685]: sda: add missing path Sep 27 15:42:24 krydos5.heliohost.org multipathd[685]: sda: failed to get udev uid: Invalid argument Sep 27 15:42:24 krydos5.heliohost.org multipathd[685]: sda: failed to get sysfs uid: Invalid argument Sep 27 15:42:24 krydos5.heliohost.org multipathd[685]: sda: failed to get sgio uid: No such file or directory The /dev/disk/by-id/ directory doesn't have any lines beginning with scsi* root@krydos5:/home/krydos# ls -la /dev/disk/by-id/ total 0 drwxr-xr-x 2 root root 60 Sep 14 03:53 . drwxr-xr-x 7 root root 140 Sep 14 03:53 .. lrwxrwxrwx 1 root root 9 Sep 14 03:53 ata-VMware_Virtual_SATA_CDRW_Drive_00000000000000000001 -> ../../sr0 If your VPS is showing these signs that means you are affected by this memory leak. The temporary solution is to restart journald to recover the memory and delete old logs to recover the disk space. The better and permanent solution is we can shutdown your VPS, edit the hardware configuration, and boot your VPS back up for you. It will result in less than 10 minutes of downtime. The reason this memory leak is happening is because we've been using the default ESXI configuration to create new VPS, but apparently that configuration doesn't provide the scsi uuid to the OS. The os tries to check the uuid every few seconds, and fills the logs quickly. All future VPS we sell shouldn't have this issue anymore now that we know about it and can configure the hardware to show the scsi uuid to the OS.
Recommended Posts