From: | mgbii bax <gezeala(at)gmail(dot)com> |
---|---|
To: | "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org> |
Subject: | systemd deletes shared memory segment in /dev/shm/Postgresql.NNNNNN |
Date: | 2016-01-22 00:07:50 |
Message-ID: | CAJKO3mV+k5d0Cg4RYwovmEfMphieT57X4KqZv-RhWxzEpu1fJQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
We were hit by some interesting addition to systemd, it appears that
logging in/out to the machine with the user account used to start the
postgres service has some catastrophic effect. A systemd process deleted
Postgresql.NNNN file in /dev/shm (tmpfs).
errors:
Jan 21 10:30:01 stg1 systemd: Started Session 3396 of user admin.
>
> Jan 21 10:30:01 stg1 systemd: Starting Session 3396 of user admin.
>
> Jan 21 10:30:01 stg1 postgres[31239]: [3-1] FATAL: semctl(13139971, 11,
>> SETVAL, 0) failed: Invalid argument
>
> Jan 21 10:30:01 stg1 postgres[28042]: [3-1] LOG: server process (PID
>> 31239) exited with exit code 1
>
> Jan 21 10:30:01 stg1 postgres[28042]: [4-1] LOG: terminating any other
>> active server processes
>
> Jan 21 10:30:01 stg1 postgres[28047]: [3-1] WARNING: terminating
>> connection because of crash of another server process
>
> Jan 21 10:30:01 stg1 postgres[28047]: [3-2] DETAIL: The postmaster has
>> commanded this server process to roll back the current transaction and
>> exit, because another server process exited abnormally and possibly
>> corrupted shared memory.
>
> Jan 21 10:30:01 stg1 postgres[28047]: [3-3] HINT: In a moment you should
>> be able to reconnect to the database and repeat your command.
>
> Jan 21 10:30:01 stg1 postgres[28042]: [5-1] LOG: all server processes
>> terminated; reinitializing
>
> Jan 21 10:30:01 stg1 postgres[28042]: [6-1] LOG: could not remove shared
>> memory segment "/PostgreSQL.1804289383": No such file or directory
>
> Jan 21 10:30:01 stg1 postgres[28042]: [7-1] LOG: semctl(13041664, 0,
>> IPC_RMID, ...) failed: Invalid argument
>
> Jan 21 10:30:01 stg1 postgres[28042]: [8-1] LOG: semctl(13074433, 0,
>> IPC_RMID, ...) failed: Invalid argument
>
> Jan 21 10:30:01 stg1 postgres[28042]: [9-1] LOG: semctl(13107202, 0,
>> IPC_RMID, ...) failed: Invalid argument
>
> Jan 21 10:30:01 stg1 postgres[28042]: [10-1] LOG: semctl(13139971, 0,
>> IPC_RMID, ...) failed: Invalid argument
>
> Jan 21 10:30:01 stg1 postgres[28042]: [11-1] LOG: semctl(13172740, 0,
>> IPC_RMID, ...) failed: Invalid argument
>
> Jan 21 10:30:01 stg1 postgres[31260]: [12-1] LOG: database system was
>> interrupted; last known up at 2016-01-21 10:23:17 PST
>
> Jan 21 10:30:01 stg1 postgres[31260]: [13-1] LOG: database system was not
>> properly shut down; automatic recovery in progress
>
> Jan 21 10:30:01 stg1 postgres[31260]: [14-1] LOG: record with zero length
>> at 130/66154E90
>
> Jan 21 10:30:01 stg1 postgres[31260]: [15-1] LOG: redo is not required
>
> Jan 21 10:30:01 stg1 postgres[31260]: [16-1] LOG: MultiXact member
>> wraparound protections are now enabled
>
> Jan 21 10:30:01 stg1 postgres[28042]: [12-1] LOG: database system is
>> ready to accept connections
>
> Jan 21 10:30:01 stg1 postgres[31267]: [12-1] LOG: autovacuum launcher
>> started
>
> Jan 21 10:30:26 stg1 systemd: Removed slice user-1001.slice.
>
> Jan 21 10:30:26 stg1 systemd: Stopping user-1001.slice.
>
> Jan 21 10:30:35 stg1 systemd: Created slice user-1001.slice.
>
> Jan 21 10:30:35 stg1 systemd: Starting user-1001.slice.
>
> Jan 21 10:30:35 stg1 systemd-logind: New session 3397 of user admin.
>
>
$ psql postgres
> psql: FATAL: semctl(11337731, 11, SETVAL, 0) failed: Invalid argument
>
>
log shows pg crashes and restarts..
$ psql postgres
> psql (9.4.5)
>>
> Type "help" for help.
>
>
>> postgres=#
>
>
Postgresql file in /dev/shm (tmpfs) appears to be removed by some systemd
process:
$ ls -lt /dev/shm/
> total 84
>
> -rw------- 1 admin admin 3916 Jan 21 09:05 PostgreSQL.1804289383 ==>
>> deleted causing the errors above
>
> -r-------- 1 gdm gdm 67108904 Jan 20 18:38 pulse-shm-3708236591
>
> -r-------- 1 gdm gdm 67108904 Jan 20 18:38 pulse-shm-4055075926
>
> -r-------- 1 gdm gdm 67108904 Jan 20 18:38 pulse-shm-3910933030
>
> -r-------- 1 gdm gdm 67108904 Jan 20 18:38 pulse-shm-979612067
>
>
>
OS:
$ cat /etc/centos-release
> CentOS Linux release 7.1.1503 (Core)
>
Postgres version:
> postgres=# select version();
-[ RECORD 1
> ]---------------------------------------------------------------------------------------------------------
version | PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc
> (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
$ cat /etc/systemd/logind.conf
>
> # This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See logind.conf(5) for details.
> [Login]
#NAutoVTs=6
#ReserveVT=6
#KillUserProcesses=no
#KillOnlyUsers=
#KillExcludeUsers=root
#InhibitDelayMaxSec=5
#HandlePowerKey=poweroff
#HandleSuspendKey=suspend
#HandleHibernateKey=hibernate
#HandleLidSwitch=suspend
#HandleLidSwitchDocked=ignore
#PowerKeyIgnoreInhibited=no
#SuspendKeyIgnoreInhibited=no
#HibernateKeyIgnoreInhibited=no
#LidSwitchIgnoreInhibited=yes
#IdleAction=ignore
#IdleActionSec=30min
#RuntimeDirectorySize=10% =>> new entry
#RemoveIPC=yes =>> new entry
Culprit could be a recent install which updated systemd to 219:
Jan 19 13:29:23 Updated: systemd-libs-219-19.el7.x86_64
Jan 19 13:29:28 Updated: systemd-219-19.el7.x86_64
Jan 19 13:29:39 Updated: systemd-sysv-219-19.el7.x86_64
Jan 19 13:29:40 Updated: systemd-python-219-19.el7.x86_64
Anybody on the list having the same issue? As a workaround, we have set the
2 new entries in logind.conf from:
> #RuntimeDirectorySize=10%
>
> #RemoveIPC=yes
>
>
to
> RuntimeDirectorySize=1%
RemoveIPC=no
>
RuntimeDirectorySize to 1% (optional), when a user ssh/logins to the server
a new tmpfs mount is created using 10% of the RAM size (machine has 512GB)
- looks like a new change that came with systemd updates too.
before mods:
$ mount | grep tmpfs
> devtmpfs on /dev type devtmpfs
> (rw,nosuid,size=264004800k,nr_inodes=66001200,mode=755)
>
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
tmpfs on /run/user/42 type tmpfs
> (rw,nosuid,nodev,relatime,size=52802808k,mode=700,uid=42,gid=42) ==> gdm
tmpfs on /run/user/0 type tmpfs
> (rw,nosuid,nodev,relatime,size=52802808k,mode=700) ==> root
tmpfs on /run/user/1001 type tmpfs
> (rw,nosuid,nodev,relatime,size=52802808k,mode=700,uid=1001,gid=1001) ==>
> some user ~51G tmpfs (new feature?)
tmpfs on /run/user/6301 type tmpfs
> (rw,nosuid,nodev,relatime,size=52802808k,mode=700,uid=6301,gid=10000) ==>
> some user
before mods:
$ df -h | grep tmpfs
> devtmpfs 252G 0 252G 0% /dev
>
tmpfs 252G 84K 252G 1% /dev/shm
tmpfs 252G 492M 252G 1% /run
tmpfs 252G 0 252G 0%
> /sys/fs/cgroup
tmpfs 51G 0 51G 0% /run/user/42
tmpfs 51G 0 51G 0% /run/user/0
>
after mods:
$ mount | grep tmpfs
> devtmpfs on /dev type devtmpfs
> (rw,nosuid,size=264004800k,nr_inodes=66001200,mode=755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
tmpfs on /run/user/42 type tmpfs
> (rw,nosuid,nodev,relatime,size=5280284k,mode=700,uid=42,gid=42)
tmpfs on /run/user/0 type tmpfs
> (rw,nosuid,nodev,relatime,size=5280284k,mode=700)
tmpfs on /run/user/1001 type tmpfs
> (rw,nosuid,nodev,relatime,size=5280284k,mode=700,uid=1001,gid=1001)
after mods:
$ df -h | grep tmpfs
> devtmpfs 252G 0 252G 0% /dev
tmpfs 252G 88K 252G 1% /dev/shm
tmpfs 252G 19M 252G 1% /run
tmpfs 252G 0 252G 0% /sys/fs/cgroup
tmpfs 5.1G 12K 5.1G 1% /run/user/42
tmpfs 5.1G 0 5.1G 0% /run/user/0
RemoveIPC to no - disabling works - /dev/shm/Postgres.NNNN file seemed to
be intact.
This is the forum post I found that can be linked to this:
http://lists.freedesktop.org/archives/systemd-devel/2014-April/018373.html
--
regards
marie gezeala bacuño II
From | Date | Subject | |
---|---|---|---|
Next Message | Ankur Kaushik | 2016-01-22 03:47:03 | Application hangs |
Previous Message | girish R G peetle | 2016-01-21 16:11:03 | Re: PostgreSQL Stand By Database Server backup (without using pg_basebackup) |