From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | More race conditions in logical replication |
Date: | 2017-07-02 23:54:48 |
Message-ID: | 11904.1499039688@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I noticed a recent failure that looked suspiciously like a race condition:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2017-07-02%2018%3A02%3A07
The critical bit in the log file is
error running SQL: 'psql:<stdin>:1: ERROR: could not drop the replication slot "tap_sub" on publisher
DETAIL: The error was: ERROR: replication slot "tap_sub" is active for PID 3866790'
while running 'psql -XAtq -d port=59543 host=/tmp/QpCJtafT7R dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'DROP SUBSCRIPTION tap_sub' at /home/nm/farm/xlc64/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pm line 1198.
After poking at it a bit, I found that I can cause several different
failures of this ilk in the subscription tests by injecting delays at
the points where a slot's active_pid is about to be cleared, as in the
attached patch (which also adds some extra printouts for debugging
purposes; none of that is meant for commit). It seems clear that there
is inadequate interlocking going on when we kill and restart a logical
rep worker: we're trying to start a new one before the old one has
gotten out of the slot.
I'm not particularly interested in fixing this myself, so I'm just
going to add it to the open items list.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
break_repl_slot_management.patch | text/x-diff | 2.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2017-07-03 00:28:05 | Re: pg_ctl wait exit code (was Re: [COMMITTERS] pgsql: Additional tests for subtransactions in recovery) |
Previous Message | Robert Haas | 2017-07-02 22:44:38 | Re: protocol version negotiation (Re: Libpq PGRES_COPY_BOTH - version compatibility) |