Re: More replication race conditions

From: Noah Misch <noah(at)leadboat(dot)com>
To: simon(at)2ndquadrant(dot)com
Cc: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: More replication race conditions
Date: 2017-08-27 02:32:49
Message-ID: 20170827023249.GD3963697@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 25, 2017 at 12:09:00PM +0200, Petr Jelinek wrote:
> On 24/08/17 19:54, Tom Lane wrote:
> > sungazer just failed with
> >
> > pg_recvlogical exited with code '256', stdout '' and stderr 'pg_recvlogical: could not send replication command "START_REPLICATION SLOT "test_slot" LOGICAL 0/0 ("include-xids" '0', "skip-empty-xacts" '1')": ERROR: replication slot "test_slot" is active for PID 8913148
> > pg_recvlogical: disconnected
> > ' at /home/nm/farm/gcc64/HEAD/pgsql.build/src/test/recovery/../../../src/test/perl/PostgresNode.pm line 1657.
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2017-08-24%2015%3A16%3A10
> >
> > Looks like we're still not there on preventing replication startup
> > race conditions.
>
> Hmm, that looks like "by design" behavior. Slot acquiring will throw
> error if the slot is already used by somebody else (slots use their own
> locking mechanism that does not wait). In this case it seems the
> walsender which was using slot for previous previous step didn't finish
> releasing the slot by the time we start new command. We can work around
> this by changing the test to wait perhaps.

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Simon,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-08-27 03:03:52 Re: More replication race conditions
Previous Message Michael Malis 2017-08-27 00:50:26 Re: Poor cost estimate with interaction between table correlation and partial indexes