pg_dump crashing

From: Matthias Schmitt <matthias(dot)schmitt(at)mmp(dot)lu>
To: pgsql-general(at)postgresql(dot)org
Subject: pg_dump crashing
Date: 2016-03-15 15:10:44
Message-ID: C2614EC6-38A6-41AB-8208-950319333E84@mmp.lu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

since two weeks I am trying to get PostgreSQL 9.5.1 to run on Debian 8.3. Everything is fine except the daily backup. When calling pg_dump as part of a cron job pg_dump crashes:

2016-03-15 01:00:02 CETFATAL: semctl(23232524, 3, SETVAL, 0) failed: Invalid argument
2016-03-15 01:00:02 CETLOG: server process (PID 22279) exited with exit code 1
2016-03-15 01:00:02 CETLOG: terminating any other active server processes
2016-03-15 01:00:02 CETWARNING: terminating connection because of crash of another server process
2016-03-15 01:00:02 CETDETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2016-03-15 01:00:02 CETHINT: In a moment you should be able to reconnect to the database and repeat your command.
2016-03-15 01:00:02 CETLOG: all server processes terminated; reinitializing
2016-03-15 01:00:02 CETLOG: could not remove shared memory segment "/PostgreSQL.1804289383": No such file or directory
2016-03-15 01:00:02 CETLOG: semctl(22839296, 0, IPC_RMID, ...) failed: Invalid argument
2016-03-15 01:00:02 CETLOG: semctl(22872065, 0, IPC_RMID, ...) failed: Invalid argument
2016-03-15 01:00:02 CETLOG: semctl(22904834, 0, IPC_RMID, ...) failed: Invalid argument

I am calling pg_dump in my cronjob like this:
su - mmppostgres -c "/Users/…/bin/pg_dump -p 5433 mydatabase_1_0_0 > /my_backup_path/mydatabase_1_0_0.dump"

After the crash the database runs in recovery mode. A restart of the database brings everything back to normal.
This crash is always reproducible and occurs every night during backup. When calling the same command via the command line everything run fine. In the system log I can see:

Mar 15 01:00:01 mymachine systemd[1]: Started User Manager for UID 1001.
Mar 15 01:00:02 mymachine systemd[1]: Stopping User Manager for UID 1001...
Mar 15 01:00:02 mymachine systemd[22241]: Stopping Default.
Mar 15 01:00:02 mymachine systemd[22241]: Stopped target Default.
Mar 15 01:00:02 mymachine systemd[22241]: Stopping Basic System.
Mar 15 01:00:02 mymachine systemd[22241]: Stopped target Basic System.
Mar 15 01:00:02 mymachine systemd[22241]: Stopping Paths.
Mar 15 01:00:02 mymachine systemd[22241]: Stopped target Paths.
Mar 15 01:00:02 mymachine systemd[22241]: Stopping Timers.
Mar 15 01:00:02 mymachine systemd[22241]: Stopped target Timers.
Mar 15 01:00:02 mymachine systemd[22241]: Stopping Sockets.
Mar 15 01:00:02 mymachine systemd[22241]: Stopped target Sockets.
Mar 15 01:00:02 mymachine systemd[22241]: Starting Shutdown.
Mar 15 01:00:02 mymachine systemd[22241]: Reached target Shutdown.
Mar 15 01:00:02 mymachine systemd[22241]: Starting Exit the Session...
Mar 15 01:00:02 mymachine systemd[22241]: Received SIGRTMIN+24 from PID 22249 (kill).

While searching through mailing lists I tried to exclude the following possibilities, which might cause the error:

1st possible solution: Another PostgreSQL instance running … no.
Some people get the ‘semctl’ error when running a second PostgreSQL process using the same semaphores. I can definitely exclude that. I have no other PostgreSQL instance running on the machine.

2nd possible solution: Checking with 'ipcs -s’ for another process deleting semaphores.
Before a crash my output looks like this:

------ Semaphore Arrays --------
key semid owner perms nsems
0x0052e6a9 20742144 mmppostgre 600 17
0x0052e6aa 20774913 mmppostgre 600 17
0x0052e6ab 20807682 mmppostgre 600 17
0x0052e6ac 20840451 mmppostgre 600 17
0x0052e6ad 20873220 mmppostgre 600 17
0x0052e6ae 20905989 mmppostgre 600 17
0x0052e6af 20938758 mmppostgre 600 17
0x0052e6b0 20971527 mmppostgre 600 17
0x0052e6b1 21004296 mmppostgre 600 17
0x0052e6b2 21037065 mmppostgre 600 17
0x0052e6b3 21069834 mmppostgre 600 17
0x0052e6b4 21102603 mmppostgre 600 17
0x0052e6b5 21135372 mmppostgre 600 17
0x0052e6b6 21168141 mmppostgre 600 17
0x00000000 21266446 www-data 600 1
0x00000000 21299215 www-data 600 1

After the crash the output looks like that:
------ Semaphore Arrays --------
key semid owner perms nsems
0x0052e6a9 22380544 mmppostgre 600 17
0x0052e6aa 22413313 mmppostgre 600 17
0x0052e6ab 22446082 mmppostgre 600 17
0x0052e6ac 22478851 mmppostgre 600 17
0x0052e6ad 22511620 mmppostgre 600 17
0x0052e6ae 22544389 mmppostgre 600 17
0x0052e6af 22577158 mmppostgre 600 17
0x0052e6b0 22609927 mmppostgre 600 17
0x0052e6b1 22642696 mmppostgre 600 17
0x0052e6b2 22675465 mmppostgre 600 17
0x0052e6b3 22708234 mmppostgre 600 17
0x0052e6b4 22741003 mmppostgre 600 17
0x0052e6b5 22773772 mmppostgre 600 17
0x0052e6b6 22806541 mmppostgre 600 17
0x00000000 21856270 www-data 600 1
0x00000000 21889039 www-data 600 1

As all server processes have been cancelled it seems to me normal, that the semids have been changed. Beyond that I can not see anything which might be useful for me.

3rd possible solution: Strange values for shared memory:
My Debian 8.3 default settings for shared memory have been:

kernel.shmall = 18446744073692774399
kernel.shmmax = 18446744073692774399
kernel.shmmni = 4096

These values have been looking strange to me, so I changed them to some more realistic values:
kernel.shmall = 4194304
kernel.shmmax = 17179869184
kernel.shmmni = 4096

4th possible solution: I have read here: https://lists.freedesktop.org/archives/systemd-devel/2014-April/018373.html
that logging into the postgres user in and out might delete semaphore memory, but I cannot reproduce this.

Interesting enough I have another PostgreSQL version running perfectly under Debian 8.3 on a different machine. This is a PostgreSQL 9.4.1. It uses the same backup mechanism. I hate to consider a downgrade as a possible solution. May be an info, which might be useful: my machine has 64 GB RAM.

Any ideas where to continue my search?

Best regards

Matthias Schmitt

magic moving pixel s.a.
23, Avenue Grande-Duchesse Charlotte
L-3441 Dudelange
Luxembourg
Phone: +352 54 75 75
http://www.mmp.lu

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John Lumby 2016-03-15 16:14:13 Re: how to switch old replication Master to new Standby after promoting old Standby - pg_rewind log file missing
Previous Message Francisco Olarte 2016-03-15 15:05:03 Re: psql question: aborting a "script"