Postgres Out Of Memory Crash

From: Avi Weinberg <AviW(at)gilat(dot)com>
To: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Postgres Out Of Memory Crash
Date: 2023-11-02 09:12:53
Message-ID: DB9PR07MB71808BC744552C349DA17E8DCBA6A@DB9PR07MB7180.eurprd07.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi experts,

I'm using Patroni Postgres installation and noticed that twice already postgres crashed due to out of memory. I'm using logical replication with around 30-40 active subscribers on this machine. The machine has 128GB but only 32GB is allocated to Postgres. How can I know what is actually causing the out of memory issue? Is it caused by not optimal postgres configuration or something else?

/usr/lib/postgresql/13/bin/postgres -D /home/postgres/pgdata/pgroot/data --config-file=/home/postgres/pgdata/pgroot/data/postgresql.conf --port=5432 --cluster_name=postgres-cluster --wal_level=logical --hot_standby=on --max_connections=533 --max_wal_senders=90 --max_prepared_transactions=0 --max_locks_per_transaction=64 --track_commit_timestamp=on --max_replication_slots=90 --max_worker_processes=30 --wal_log_hints=on

Oct 27 07:05:31 node2 kernel: postgres invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=993
Oct 27 07:05:31 node2 kernel: postgres cpuset=docker-6ae67e04710619972d3b1ab5d4c69c318d001c2da47fecee121cdc60279a14a0.scope mems_allowed=0
Oct 27 07:05:31 node2 kernel: CPU: 6 PID: 15536 Comm: postgres Kdump: loaded Tainted: G ------------ T 3.10.0-1160.el7.x86_64 #1
Oct 27 07:05:31 node2 kernel: Hardware name: Kontron MSP8040/4008, BIOS Core: 5.11, MSP804x: 1.57.0943FC77 05/06/2020
Oct 27 07:05:31 node2 kernel: Call Trace:
Oct 27 07:05:31 node2 kernel: [<ffffffffa4581340>] dump_stack+0x19/0x1b
Oct 27 07:05:31 node2 kernel: [<ffffffffa457bc60>] dump_header+0x90/0x229
Oct 27 07:05:31 node2 kernel: [<ffffffffa409d008>] ? ep_poll_callback+0xf8/0x220
Oct 27 07:05:31 node2 kernel: [<ffffffffa3fc1b26>] ? find_lock_task_mm+0x56/0xc0
Oct 27 07:05:31 node2 kernel: [<ffffffffa403c8c8>] ? try_get_mem_cgroup_from_mm+0x28/0x60
Oct 27 07:05:31 node2 kernel: [<ffffffffa3fc208d>] oom_kill_process+0x2cd/0x490
Oct 27 07:05:31 node2 kernel: [<ffffffffa4040cdc>] mem_cgroup_oom_synchronize+0x55c/0x590
Oct 27 07:05:31 node2 kernel: [<ffffffffa4040140>] ? mem_cgroup_charge_common+0xc0/0xc0
Oct 27 07:05:31 node2 kernel: [<ffffffffa3fc2974>] pagefault_out_of_memory+0x14/0x90
Oct 27 07:05:31 node2 kernel: [<ffffffffa457a16c>] mm_fault_error+0x6a/0x157
Oct 27 07:05:31 node2 kernel: [<ffffffffa458e8d1>] __do_page_fault+0x491/0x500
Oct 27 07:05:31 node2 kernel: [<ffffffffa458e975>] do_page_fault+0x35/0x90
Oct 27 07:05:31 node2 kernel: [<ffffffffa458a778>] page_fault+0x28/0x30
Oct 27 07:05:31 node2 kernel: Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-6ae67e0471061997$
Oct 27 07:05:31 node2 kernel: memory: usage 32768000kB, limit 32768000kB, failcnt 144867
Oct 27 07:05:31 node2 kernel: memory+swap: usage 32768000kB, limit 9007199254740988kB, failcnt 0
Oct 27 07:05:31 node2 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Oct 27 07:05:31 node2 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice: cache:$
Oct 27 07:05:31 node2 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$
Oct 27 07:05:31 node2 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$
Oct 27 07:05:31 node2 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Oct 27 07:05:31 node2 kernel: [13159] 0 13159 239 1 3 0 -998 pause
Oct 27 07:05:31 node2 kernel: [13322] 0 13322 1095 97 8 0 993 dumb-init
Oct 27 07:05:31 node2 kernel: [13335] 0 13335 1156 171 8 0 993 sh
Oct 27 07:05:31 node2 kernel: [13411] 0 13411 1137 98 8 0 993 runsvdir
Oct 27 07:05:31 node2 kernel: [13438] 0 13438 1099 98 7 0 993 runsv
Oct 27 07:05:31 node2 kernel: [13439] 0 13439 1099 98 7 0 993 runsv
Oct 27 07:05:31 node2 kernel: [13440] 101 13440 27026 1186 54 0 993 pgqd
Oct 27 07:05:31 node2 kernel: [13441] 101 13441 155215 8237 101 0 993 patroni
Oct 27 07:05:31 node2 kernel: [19532] 101 19532 1740033 46817 171 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19542] 101 19542 1767874 6713 121 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19546] 101 19546 1740173 1445031 3166 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19547] 101 19547 1740069 20060 171 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19548] 101 19548 1740027 4821 86 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19549] 101 19549 1740283 1011 91 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19549] 101 19549 1740283 1011 91 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19550] 101 19550 50320 798 77 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19551] 101 19551 1740524 2097 117 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19552] 101 19552 1740766 2043 100 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19556] 101 19556 1744493 16306 476 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19588] 101 19588 1744522 16353 484 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19598] 101 19598 1745580 22539 602 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19609] 101 19609 5909368 4364063 9947 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19617] 101 19617 1745576 16669 494 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19676] 101 19676 1740468 2208 113 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19744] 101 19744 1741155 19698 615 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19776] 101 19776 1742008 105333 1733 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19793] 101 19793 1740738 58109 631 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19799] 101 19799 1740737 29227 546 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19806] 101 19806 1740435 1393 96 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19811] 101 19811 1741021 3658 139 0 993 postgres
Oct 27 07:05:31 node2 kernel: [19971] 101 19971 1740734 15521 449 0 993 postgres
Oct 27 07:05:31 node2 kernel: [20007] 101 20007 1740435 1394 96 0 993 postgres
Oct 27 07:05:31 node2 kernel: [20537] 101 20537 1891639 121701 407 0 993 postgres
Oct 27 07:05:31 node2 kernel: [20623] 101 20623 1889590 121300 405 0 993 postgres
Oct 27 07:05:31 node2 kernel: [20840] 101 20840 1889591 120970 406 0 993 postgres
Oct 27 07:05:31 node2 kernel: [20851] 101 20851 1891639 122165 408 0 993 postgres
Oct 27 07:05:31 node2 kernel: [20955] 101 20955 1889591 120056 405 0 993 postgres
Oct 27 07:05:31 node2 kernel: [21117] 101 21117 1891639 121217 405 0 993 postgres
Oct 27 07:05:31 node2 kernel: [21328] 101 21328 1891639 120761 404 0 993 postgres
Oct 27 07:05:31 node2 kernel: [21332] 101 21332 1889591 119904 402 0 993 postgres
Oct 27 07:05:31 node2 kernel: [21451] 101 21451 1891639 120435 403 0 993 postgres
Oct 27 07:05:31 node2 kernel: [21988] 101 21988 1889591 121723 407 0 993 postgres
Oct 27 07:05:31 node2 kernel: [ 2456] 101 2456 1889592 120006 406 0 993 postgres
Oct 27 07:05:31 node2 kernel: [ 2456] 101 2456 1889592 120006 406 0 993 postgres
Oct 27 07:05:31 node2 kernel: [32261] 101 32261 1878518 100278 365 0 993 postgres
Oct 27 07:05:31 node2 kernel: [ 3352] 101 3352 1926434 149555 459 0 993 postgres
Oct 27 07:05:31 node2 kernel: [ 3893] 101 3893 1926436 149552 459 0 993 postgres
Oct 27 07:05:31 node2 kernel: [15217] 101 15217 1880573 101541 370 0 993 postgres
Oct 27 07:05:31 node2 kernel: [15246] 101 15246 1879605 101892 368 0 993 postgres
Oct 27 07:05:31 node2 kernel: [14774] 101 14774 1773892 26263 219 0 993 postgres
Oct 27 07:05:31 node2 kernel: [15904] 101 15904 1773886 26721 220 0 993 postgres
Oct 27 07:05:31 node2 kernel: [ 9266] 101 9266 1744448 15139 468 0 993 postgres
Oct 27 07:05:31 node2 kernel: [15536] 101 15536 1800641 46553 261 0 993 postgres
Oct 27 07:05:31 node2 kernel: [21552] 101 21552 1918253 139792 444 0 993 postgres
Oct 27 07:05:31 node2 kernel: [22239] 101 22239 1769356 21754 210 0 993 postgres
Oct 27 07:05:31 node2 kernel: [13344] 101 13344 1767312 19434 206 0 993 postgres
Oct 27 07:05:31 node2 kernel: [27568] 101 27568 1743509 11079 327 0 993 postgres
Oct 27 07:05:31 node2 kernel: [27754] 101 27754 1743483 10910 313 0 993 postgres
Oct 27 07:05:31 node2 kernel: [28036] 101 28036 1743550 12152 385 0 993 postgres
Oct 27 07:05:31 node2 kernel: [ 7772] 101 7772 1743520 11171 332 0 993 postgres
Oct 27 07:05:31 node2 kernel: [20855] 101 20855 1743564 13723 422 0 993 postgres
Oct 27 07:05:31 node2 kernel: [30883] 101 30883 1743518 11180 333 0 993 postgres
Oct 27 07:05:31 node2 kernel: [31463] 101 31463 1740560 3099 137 0 993 postgres
Oct 27 07:05:31 node2 kernel: [31466] 101 31466 1740560 3084 137 0 993 postgres
Oct 27 07:05:31 node2 kernel: [31475] 101 31475 1740560 3101 137 0 993 postgres
Oct 27 07:05:31 node2 kernel: [31485] 101 31485 1740560 3080 137 0 993 postgres
Oct 27 07:05:31 node2 kernel: [31513] 101 31513 1740560 3082 137 0 993 postgres
Oct 27 07:05:31 node2 kernel: [ 3321] 101 3321 1753416 13629 164 0 993 postgres
Oct 27 07:05:31 node2 kernel: [ 3325] 101 3325 1753416 13756 164 0 993 postgres
Oct 27 07:05:31 node2 kernel: [16159] 101 16159 1743331 6323 310 0 993 postgres
Oct 27 07:05:31 node2 kernel: [32368] 101 32368 1743330 6156 311 0 993 postgres
Oct 27 07:05:31 node2 kernel: [ 1528] 101 1528 2009656 24435 346 0 993 postgres
-- more lines...
Oct 27 07:05:31 node2 kernel: Memory cgroup out of memory: Kill process 19609 (postgres) score 1526 or sacrifice child
Oct 27 07:05:31 node2 kernel: Killed process 19609 (postgres), UID 101, total-vm:23637472kB, anon-rss:16612848kB, file-rss:19156kB, shmem-rss:824248kB
Oct 27 07:05:32 node2 etcd: 2023-10-27 07:05:32.607769 W | etcdserver: read-only range request "key:\"/registry/services/endpoints/default/kubernetes\" " with result "$
Oct 27 07:05:34 node2 kubelet: I1027 07:05:34.624970 24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$
Oct 27 07:05:38 node2 kubelet: I1027 07:05:38.823153 24409 setters.go:86] Using node IP: "x.x.x.x"
Oct 27 07:05:44 node2 kubelet: I1027 07:05:44.011520 24409 container_manager_linux.go:490] [ContainerManager]: Discovered runtime cgroups name: /systemd/system.slice
Oct 27 07:05:44 node2 kubelet: I1027 07:05:44.826318 24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$
Oct 27 07:05:48 node2 kubelet: I1027 07:05:48.915249 24409 setters.go:86] Using node IP: "x.x.x.x"
Oct 27 07:05:55 node2 kubelet: I1027 07:05:55.104931 24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$
Oct 27 07:05:58 node2 kubelet: I1027 07:05:58.997936 24409 setters.go:86] Using node IP: "x.x.x.x"
Oct 27 07:06:01 node2 systemd: Started Session 332365 of user root.
Oct 27 07:06:06 node2 kubelet: I1027 07:06:06.133210 24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$
Oct 27 07:06:08 node2 kubelet: I1027 07:06:08.029905 24409 kubelet_getters.go:178] "Pod status updated" pod="kube-system/kube-scheduler-node2" status=Running
Oct 27 07:06:08 node2 kubelet: I1027 07:06:08.029966 24409 kubelet_getters.go:178] "Pod status updated" pod="kube-system/kube-apiserver-node2" status=Running
Oct 27 07:06:08 node2 kubelet: I1027 07:06:08.029996 24409 kubelet_getters.go:178] "Pod status updated" pod="kube-system/kube-controller-manager-node2" status=Ru$
Oct 27 07:06:08 node2 kubelet: E1027 07:06:08.874227 24409 remote_runtime.go:392] ExecSync 28f30220f15ec2a26071c2175f24f77d1b20870fcb3c24bc659d824223d93deb '/usr/bin$
Oct 27 07:06:08 node2 kubelet: I1027 07:06:08.874326 24409 prober.go:117] Readiness probe for "calico-kube-controllers-7c5b64bf96-q6rjm_kube-system(14519a12-7721-43b$
Oct 27 07:06:09 node2 kubelet: I1027 07:06:09.090193 24409 setters.go:86] Using node IP: "x.x.x.x"
Oct 27 07:06:09 node2 etcd: 2023-10-27 07:06:09.223921 I | mvcc: store.index: compact 76535976
Oct 27 07:06:09 node2 etcd: 2023-10-27 07:06:09.253047 I | mvcc: finished scheduled compaction at 76535976 (took 28.190376ms)
Oct 27 07:06:15 node2 kubelet: I1027 07:06:15.232655 24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$
Oct 27 07:06:19 node2 kubelet: W1027 07:06:19.053762 24409 kubelet_pods.go:880] Unable to retrieve pull secret /registrypullsecret for /-arcmana$
Oct 27 07:06:19 node2 kubelet: I1027 07:06:19.198356 24409 setters.go:86] Using node IP: "x.x.x.x"
Oct 27 07:06:25 node2 kubelet: I1027 07:06:25.132514 24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$
Oct 27 07:06:29 node2 kubelet: I1027 07:06:29.279555 24409 setters.go:86] Using node IP: "x.x.x.x"
Oct 27 07:06:34 node2 kubelet: I1027 07:06:34.596501 24409 kubelet.go:1926] SyncLoop (UPDATE, "api"): "postgres-cluster-0_postgres(5984f099-30b0-4506-873$
Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.005277 W | etcdserver: read-only range request "key:\"/registry/ingress/\" range_end:\"/registry/ingress0\" count_only:$
Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.005533 W | etcdserver: request "header:<ID:1981451119914121853 username:\"etcd-node-node1\" auth_revision:1 > txn:<$
Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.005961 W | etcdserver: read-only range request "key:\"/registry/volumeattachments/\" range_end:\"/registry/volumeattach$
Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.005996 W | etcdserver: read-only range request "key:\"/registry/crd.projectcalico.org/ipamblocks/\" range_end:\"/regist$
Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.006012 W | etcdserver: read-only range request "key:\"/registry/leases/kube-system/kube-scheduler\" " with result "rang$
Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.006092 W | etcdserver: read-only range request "key:\"/registry/namespaces/kube-system\" " with result "range_response_$
Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.307799 W | etcdserver: read-only range request "key:\"/registry/apiextensions.k8s.io/customresourcedefinitions/\" range$
Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.308862 W | etcdserver: read-only range request "key:\"/registry/health\" " with result "range_response_count:0 size:7" $
Oct 27 07:06:36 node2 etcd: 2023-10-27 07:06:36.308906 W | etcdserver: read-only range request "key:\"/registry/namespaces/default\" " with result "r
IMPORTANT - This email and any attachments is intended for the above named addressee(s), and may contain information which is confidential or privileged. If you are not the intended recipient, please inform the sender immediately and delete this email: you should not copy or use this e-mail for any purpose nor disclose its contents to any person.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Paul Förster 2023-11-02 09:46:19 Re: pg_checksums?
Previous Message Tomek 2023-11-02 09:07:50 Re: pg_dump/pg_restore --jobs practical limit?