From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | ronan(dot)dunklau(at)aiven(dot)io |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: pg_receivewal starting position |
Date: | 2021-07-29 05:32:37 |
Message-ID: | 20210729.143237.1445082330516607084.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Wed, 28 Jul 2021 12:57:39 +0200, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> wrote in
> Le mercredi 28 juillet 2021, 08:22:30 CEST Kyotaro Horiguchi a écrit :
> > At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
> > wrote in
> > > I don't know if it should be the default, toggled by a command line flag,
> > > or if we even should let the user provide a LSN.
> >
> > *I* think it is completely reasonable (or at least convenient or less
> > astonishing) that pg_receivewal starts from the restart_lsn of the
> > replication slot to use. The tool already decides the clean-start LSN
> > a bit unusual way. And it seems to me that proposed behavior can be
> > the default when -S is specified.
> >
>
> As of now we can't get the replication_slot restart_lsn with a replication
> connection AFAIK.
>
> This implies that the patch could require the user to specify a maintenance-db
> parameter, and we would use that if provided to fetch the replication slot
> info, or fallback to the previous behaviour. I don't really like this approach
> as the behaviour changing wether we supply a maintenance-db parameter or not
> is error-prone for the user.
>
> Another option would be to add a new replication command (for example
> ACQUIRE_REPLICATION_SLOT <slot_name>) to set the replication slot as the
> current one, and return some info about it (restart_lsn at least for a
> physical slot).
I didn't thought in details. But I forgot that ordinary SQL commands
have been prohibited in physical replication connection. So we need a
new replication command but it's not that a big deal.
> I don't see any reason not to make it work for logical replication connections
> / slots, but it wouldn't be that useful since we can query the database in
> that case.
Ordinary SQL queries are usable on a logical replication slot so
I'm not sure how logical replication connection uses the command.
However, like you, I wouldn't bother restricting the command to
physical replication, but perhaps the new command should return the
slot type.
> Acquiring the replication slot instead of just reading it would make sure that
> no other process could start the replication between the time we read the
> restart_lsn and when we issue START_REPLICATION. START_REPLICATION could then
> check if we already have a replication slot, and ensure it is the same one as
> the one we're trying to use.
I'm not sure it's worth adding complexity for such strictness.
START_REPLICATION safely fails if someone steals the slot meanwhile.
In the first place there's no means to protect a slot from others
while idle. One possible problem is the case where START_REPLICATION
successfully acquire the slot after the new command failed. But that
case doesn't seem worse than the case someone advances the slot while
absence. So I think READ_REPLICATION_SLOT is sufficient.
> From pg_receivewal point of view, this would amount to:
>
> - check if we currently have wal in the target directory.
> - if we do, proceed as currently done, by computing the start lsn and
> timeline from the last archived wal
> - if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the
> restart_lsn as the start lsn if there is one, and don't provide a timeline
> - if we still don't have a start_lsn, fallback to using the current server
> wal position as is done.
That's pretty much it.
> What do you think ? Which information should we provide about the slot ?
We need the timeline id to start with when using restart_lsn. The
current timeline can be used in most cases but there's a case where
the LSN is historical.
pg_receivewal doesn't send a replication status report when a segment
is finished. So after pg_receivewal stops just after a segment is
finished, the slot stays at the beginning of the last segment. Thus
next time it will start from there, creating a duplicate segment.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Amul Sul | 2021-07-29 05:47:16 | Re: needless complexity in StartupXLOG |
Previous Message | kuroda.hayato@fujitsu.com | 2021-07-29 05:12:57 | [postgres_fdw] add local pid to fallback_application_name |