Re: Create replication slot in pg_basebackup if requested and not yet present

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Create replication slot in pg_basebackup if requested and not yet present
Date: 2017-09-12 20:39:20
Message-ID: 1505248760.16872.1.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Am Dienstag, den 12.09.2017, 08:53 -0400 schrieb Peter Eisentraut:
> On 9/11/17 03:11, Michael Banck wrote:
> > > Is there a race condition here? The slot is created after the checkpoint
> > > is completed. But you have to start streaming from the LSN where the
> > > checkpoint started, so shouldn't the slot be created before the checkpoint
> > > is started?
> >
> > So my patch only moves the slot creation slightly further forward,
> > AFAICT.
> >
> > AIUI, wal streaming always begins at last checkpoint and from my tests
> > the restart_lsn of the created replication slot is also before that
> > checkpoint's lsn. However, I hope somebody more familiar with the
> > WAL/replication slot code could comment on that. What I dropped in the
> > refactoring is the RESERVE_WAL that used to be there when the temporary
> > slot gets created, I have readded that now.
>
> Maybe there is an argument to be made here about whether this is correct
> or not, but why bother and risk the fragility? Why not create the
> replication slot first thing. I would put it after the server version
> checks and before we write recovery.conf.

The replication slots are created via the replication protocol through
the second background connection that is used for WAL streaming in
StartLogStreamer().

By their nature temporary replication slots must be created by that WAL
streamer using them and cannot be created by the main connection which
initiates the snapshot (or else you get a "replication slot
"pg_basebackup_XXX" is active for PID XXX" error in the WAL streamer).
So ISTM we cannot rip out CreateReplicationSlot() (or the manual
CREATE_REPLICATION_SLOT that is currently in master) from
StartLogStreamer() at least for temporary slots.

We could split up the logic here and create the optional physical
replication slot in the main connection and the temporary one in the WAL
streamer connection, but this would keep any fragility around for
(likely more frequently used) temporary replication slots. It would make
the patch much smaller though if I revert touching temporary slots at
all.

The alternative would be to call StartLogStreamer() earlier, but it
requires xlogstart as argument so we cannot move its call before the
checkpoint is taken and xlogstart is determined, the earliest I managed
was when starttli is determined, which is just few instructions earlier
than now.

Thoughts?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-09-12 20:44:34 Re: Patches that don't apply or don't compile: 2017-09-12
Previous Message Chapman Flack 2017-09-12 20:12:00 Re: Faster methods for getting SPI results