Re: [PATCH] Fix drop replication slot blocking instead of returning error

From: Simone Gotti <simone(dot)gotti(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix drop replication slot blocking instead of returning error
Date: 2017-08-29 11:42:05
Message-ID: CAEvsy6UhW+cEuTcz1_jbynRhKeiZn9fA95dd4z=VS7D6LgvUrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 29, 2017 at 12:13 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
>

Hi Alvaro,

> Simone Gotti wrote:
> > Hi all,
> >
> > I noticed that in postgres 10beta3, calling pg_drop_replication_slot on an
> > active slot will block until it's released instead of returning an error
> > like
> > done in pg 9.6. Since this is a change in the previous behavior and the docs
> > wasn't changed I made a patch to restore the previous behavior.
>
> Changing that behavior was the entire point of the cited commit.

Sorry, I was thinking that the new behavior was needed for internal
future functions since the doc wasn't changed.

>
> A better fix, from my perspective, is to amend the docs as per the
> attached patch. This is what would be useful for logical replication,
> which is what replication slots were invented for in the first place.

I don't know the reasons why the new behavior is better for logical
replication so I trust you. We are using repl slots for physical
replication.

> If you disagree, let's discuss what other use cases you have, and we can
> come up with alternatives that satisfy both.

I just faced the opposite problem, in stolon [1], we currently rely on
the previous behavior. i.e. we don't want to block waiting for a slot
to be released (that under some partitioning conditions could not
happen for a long time), but prefer to continue retrying the drop
later. Now we partially avoid blocking timing out the drop operation
after some seconds.
Another idea will be to query the slot status before doing the drop
but will lead to a race condition (probably the opposite that that
commit was fixing) if the slot is acquired between the query and the
drop.

> I think a decent answer,
> but one which would create a bit of extra churn, would be to have an
> optional boolean flag in the command/function for "nowait", instead of
> hardcoding either behavior.

I think that this will be the best fix. I'm not sure on the policy of
these commands and if backward compatibility will be better (in such
case the old behavior should be the default and a new "wait" flag
could be added).

If the default behavior is going to change we have to add different
code for postgres >= 10.

Thanks,
Simone.

[1] https://github.com/sorintlab/stolon

>
>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-08-29 11:44:42 Re: More replication race conditions
Previous Message Etsuro Fujita 2017-08-29 11:18:19 Re: Tuple-routing for certain partitioned tables not working as expected