Since the recent Bloggulus outage, I’ve been keeping a close eye on things.
While the server has mostly been stable, I still noticed the occasional OOM kill after creating backups via pg2s3.
Here is an example from journalctl -u pg2s3
logs (notice that this was happening nearly a month after the most-outage memory increase):
Jul 01 09:00:03 bloggulus pg2s3[32400]: created bloggulus_2024-07-01T09:00:00Z.backup.age
Jul 01 09:00:04 bloggulus pg2s3[32400]: deleted bloggulus_2024-05-04T09:00:00Z.backup.age
Jul 02 09:00:05 bloggulus systemd[1]: pg2s3.service: A process of this unit has been killed by the OOM killer.
Jul 02 09:00:05 bloggulus systemd[1]: pg2s3.service: Main process exited, code=killed, status=9/KILL
Jul 02 09:00:05 bloggulus systemd[1]: pg2s3.service: Failed with result 'oom-kill'.
Jul 02 09:00:05 bloggulus systemd[1]: pg2s3.service: Consumed 9.452s CPU time.
Okay, so it seems like the 1GB of RAM isn’t quite enough when backups are taking place. The server works just fine under normal operation, however, but backups push it over the edge. If only there was a way to “download more RAM” and give the server a bit more breathing room…
Swap Space Link to heading
Enter the swap space! This is a Linux concept for giving servers additional memory capabilities without increasing the amount of physical RAM installed. It works by using a regular file (on the filesystem) for memory overflow. It isn’t as fast as regular RAM, but it is better than having processes getting OOM killed! From All about Linux swap space:
Linux divides its physical RAM (random access memory) into chunks of memory called pages. Swapping is the process whereby a page of memory is copied to the preconfigured space on the hard disk, called swap space, to free up that page of memory. The combined sizes of the physical memory and the swap space is the amount of virtual memory available.
I didn’t realize that my Digital Ocean droplets don’t have swap configured and enabled by default. Thankfully, it is quite easy to set up and will hopefully help eliminate those pesky remaining OOM kills. Digital Ocean has a great guide for setting up and configuring swap space on an Ubuntu server. Initially, I followed these steps manually. Then, once I got things working, I decided to “lock it in” via my Ansible automation.
Ansible Tasks Link to heading
The automation was quite simple: it only takes five tasks!
I used Jeff Geerling’s awesome ansible-role-swap for inspiration and guidance.
In short, these tasks create, initialize, and enable a 1GB swap file on the root filesystem (at /swapfile
).
It also adds an entry to fstab so that the swap space is automatically enabled on subsequent restarts.
Let’s take a look:
- name: Create swapfile
command:
cmd: fallocate -l 1G /swapfile
creates: /swapfile
register: create_swapfile
become: yes
become_user: root
- name: Set swapfile permissions
file:
path: /swapfile
mode: "0600"
become: yes
become_user: root
- name: Initialize swapfile
command:
cmd: mkswap /swapfile
when: create_swapfile is changed
become: yes
become_user: root
- name: Enable swapfile
command:
cmd: swapon /swapfile
when: create_swapfile is changed
become: yes
become_user: root
- name: Add swapfile to fstab
mount:
name: none
src: /swapfile
fstype: swap
opts: sw
state: present
become: yes
become_user: root
Bonus Tuning Link to heading
The Digital Ocean guide also details a few “bonus” settings that can help a server manage its swap more efficiently.
swappiness Link to heading
This setting controls how eager a system is to boot data out of main memory and into the swapfile.
Since we only want the server to use the swap space when absolutely necessary, we adjust this setting to a low value.
The default value is 60
.
- name: Configure swappiness
sysctl:
name: vm.swappiness
value: 10
become: yes
become_user: root
vfs_cache_pressure Link to heading
This setting controls how quickly the server releases directory and inode (file) information from the cache.
Lowering this value causes filesystem data (which can be expensive to retrieve) to remain in the cache for longer periods of time.
The default value is 100
.
- name: Configure vfs_cache_pressure
sysctl:
name: vm.vfs_cache_pressure
value: 50
become: yes
become_user: root
Conclusion Link to heading
You can view all of these tasks together in my devops repo. I also decided that my server role had gotten a bit messy and decided to split it into separate files using Ansible’s include_tasks directive. Now I’ve got a clean separation of tasks which is much easier to navigate and understand.
Since I added swap space to the Bloggulus server, I haven’t seen a single OOM kill. Things have been running smoothly for weeks and only a small portion of the swap is being used:
total used free shared buff/cache available
Mem: 957Mi 226Mi 77Mi 92Mi 653Mi 481Mi
Swap: 1.0Gi 41Mi 982Mi
Let’s hope things stay that way! Thanks for reading.