Re: Properly handle OOM death?

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Israel Brewster <ijbrewster(at)alaska(dot)edu>, PostgreSQL Mailing Lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: Properly handle OOM death?
Date: 2023-03-13 17:28:12
Message-ID: 17d66a4c-4b05-106e-b2c7-e2babbd31de7@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 3/13/23 10:21 AM, Israel Brewster wrote:
> I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit
> more memory constrained than I would like, such that every week or so
> the various processes running on the machine will align badly and the
> OOM killer will kick in, killing off postgresql, as per the following
> journalctl output:
>
> Mar 12 04:04:23 novarupta systemd[1]: postgresql(at)13-main(dot)service: A
> process of this unit has been killed by the OOM killer.
> Mar 12 04:04:32 novarupta systemd[1]: postgresql(at)13-main(dot)service: Failed
> with result 'oom-kill'.
> Mar 12 04:04:32 novarupta systemd[1]: postgresql(at)13-main(dot)service:
> Consumed 5d 17h 48min 24.509s CPU time.
>
> And the service is no longer running.
>
> When this happens, I go in and restart the postgresql service, and
> everything is happy again for the next week or two.
>
> Obviously this is not a good situation. Which leads to two questions:
>
> 1) is there some tweaking I can do in the postgresql config itself to
> prevent the situation from occurring in the first place?
> 2) My first thought was to simply have systemd restart postgresql
> whenever it is killed like this, which is easy enough. Then I looked at
> the default unit file, and found these lines:
>
> # prevent OOM killer from choosing the postmaster (individual backends will
> # reset the score to 0)
> OOMScoreAdjust=-900
> # restarting automatically will prevent "pg_ctlcluster ... stop" from
> working,
> # so we disable it here. Also, the postmaster will restart by itself on most
> # problems anyway, so it is questionable if one wants to enable external
> # automatic restarts.
> #Restart=on-failure
>
> Which seems to imply that the OOM killer should only be killing off
> individual backends, not the entire cluster to begin with - which should
> be fine. And also that adding the restart=on-failure option is probably
> not the greatest idea. Which makes me wonder what is really going on?

You might want to read:

https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT

>
> Thanks.
>
> ---
> Israel Brewster
> Software Engineer
> Alaska Volcano Observatory
> Geophysical Institute - UAF
> 2156 Koyukuk Drive
> Fairbanks AK 99775-7320
> Work: 907-474-5172
> cell:  907-328-9145
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Israel Brewster 2023-03-13 17:36:29 Re: Properly handle OOM death?
Previous Message Israel Brewster 2023-03-13 17:21:18 Properly handle OOM death?