Re: Properly handle OOM death?

From: Israel Brewster <ijbrewster(at)alaska(dot)edu>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: PostgreSQL Mailing Lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: Properly handle OOM death?
Date: 2023-03-13 17:36:29
Message-ID: F478BAD7-5E2B-46EB-A435-F8315480BB09@alaska.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> On Mar 13, 2023, at 9:28 AM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> wrote:
>
> On 3/13/23 10:21 AM, Israel Brewster wrote:
>> I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more memory constrained than I would like, such that every week or so the various processes running on the machine will align badly and the OOM killer will kick in, killing off postgresql, as per the following journalctl output:
>> Mar 12 04:04:23 novarupta systemd[1]: postgresql(at)13-main(dot)service: A process of this unit has been killed by the OOM killer.
>> Mar 12 04:04:32 novarupta systemd[1]: postgresql(at)13-main(dot)service: Failed with result 'oom-kill'.
>> Mar 12 04:04:32 novarupta systemd[1]: postgresql(at)13-main(dot)service: Consumed 5d 17h 48min 24.509s CPU time.
>> And the service is no longer running.
>> When this happens, I go in and restart the postgresql service, and everything is happy again for the next week or two.
>> Obviously this is not a good situation. Which leads to two questions:
>> 1) is there some tweaking I can do in the postgresql config itself to prevent the situation from occurring in the first place?
>> 2) My first thought was to simply have systemd restart postgresql whenever it is killed like this, which is easy enough. Then I looked at the default unit file, and found these lines:
>> # prevent OOM killer from choosing the postmaster (individual backends will
>> # reset the score to 0)
>> OOMScoreAdjust=-900
>> # restarting automatically will prevent "pg_ctlcluster ... stop" from working,
>> # so we disable it here. Also, the postmaster will restart by itself on most
>> # problems anyway, so it is questionable if one wants to enable external
>> # automatic restarts.
>> #Restart=on-failure
>> Which seems to imply that the OOM killer should only be killing off individual backends, not the entire cluster to begin with - which should be fine. And also that adding the restart=on-failure option is probably not the greatest idea. Which makes me wonder what is really going on?
>
> You might want to read:
>
> https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT

Good information, thanks. One thing there confuses me though. It says:

Another approach, which can be used with or without altering vm.overcommit_memory, is to set the process-specific OOM score adjustment value for the postmaster process to -1000, thereby guaranteeing it will not be targeted by the OOM killer

Isn’t that exactly what the "OOMScoreAdjust=-900” line in the Unit file does though (except with a score of -900 rather than -1000)?

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145
>
>> Thanks.
>> ---
>> Israel Brewster
>> Software Engineer
>> Alaska Volcano Observatory
>> Geophysical Institute - UAF
>> 2156 Koyukuk Drive
>> Fairbanks AK 99775-7320
>> Work: 907-474-5172
>> cell: 907-328-9145
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Joe Conway 2023-03-13 17:36:31 Re: Properly handle OOM death?
Previous Message Adrian Klaver 2023-03-13 17:28:12 Re: Properly handle OOM death?