about summary refs log tree commit diff
diff options
context:
space:
mode:
authorWilliam Carroll <wpcarro@gmail.com>2022-05-16T19·05-0700
committerclbot <clbot@tvl.fyi>2022-05-26T16·48+0000
commitd100c1f49f93cff9b0643c842cd2ea5e878b1809 (patch)
tree7d04e7095f49dcd0c495f17a80e0614b4a2fd01a
parentc16a18a7180497ca91b06d8189463c54294cee3f (diff)
feat(wpcarro/ava): Support earlyoom r/4142
Strange start to my Monday where I spent ~2h debugging my hanging
NixOS. Strangely I'm not sure I made any changes to my configuration to trigger
this, and I was finding this hard to reproduce:
- graphical X sessions hung (once when opening Chrome)
- TTYs hung (during `nix-build` and `rebuild-system`)

Per kn's recommendations whenever a system is hanging, see if it's reachable
over the network (e.g. SSH). Since I didn't have my laptop, I downloaded Termius
on my iPhone, which I used to mosh into ava, which is a surprisingly nice UX.

I suspect my machine (with only 8GB of RAM) was OOMing, but I'm not
certain. Thanks to grfn I installed `earlyoom`. For more commentary, check-out
Profpatsch's blog post about this: https://profpatsch.de/notes/preventing-oom

What went well:
- Thankfully I installed a Matrix client on my iPhone last week, which allowed
  me to troubleshoot with the #tvl folks

AIs:
- I'd like some instrumentation like Prometheus, Loki (`journald`, `dmesg`), so
  that I can accumulate troubleshooting information that isn't destroyed when I
  reboot my machine (which I did 1/2-dozen times today).
- Consider adding `git` metadata to `system.nixos.label` to get more useful
  information in a GRUB/EFI context.

More unknowns:
- Why can't I switch back to EFI (from GRUB) for my bootloader?

Change-Id: Ie2a5a15f5c0ead346d50e331fa2937f8f3453960
Reviewed-on: https://cl.tvl.fyi/c/depot/+/5625
Tested-by: BuildkiteCI
Reviewed-by: wpcarro <wpcarro@gmail.com>
Autosubmit: wpcarro <wpcarro@gmail.com>
-rw-r--r--users/wpcarro/nixos/ava/default.nix4
1 files changed, 4 insertions, 0 deletions
diff --git a/users/wpcarro/nixos/ava/default.nix b/users/wpcarro/nixos/ava/default.nix
index 267b46fdf5..36529c3550 100644
--- a/users/wpcarro/nixos/ava/default.nix
+++ b/users/wpcarro/nixos/ava/default.nix
@@ -42,6 +42,10 @@ in
   };
 
   services = wpcarro.common.services // {
+    # Check the amount of available memory and free swap a few times per second
+    # and kill the largest process if both are below 10%.
+    earlyoom.enable = true;
+
     tailscale.enable = true;
 
     openssh.enable = true;