Re: replication slot restart_lsn initialization

From: Andres Freund <andres(at)anarazel(dot)de>
To: Gurjeet Singh <gurjeet(at)singh(dot)im>
Cc: "Duran, Danilo" <danilod(at)amazon(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: replication slot restart_lsn initialization
Date: 2015-06-10 15:07:38
Message-ID: 20150610150738.GD10551@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-06-10 08:00:28 -0700, Gurjeet Singh wrote:
> Attached is the patch that takes the former approach (initialize
> restart_lsn when the slot is created).

If it's an option that's imo a sane approach.

> pg_create_logical_replication_slot() prevents LSN from being
> recycled that by looping (worst case 2 times) until there's no
> conflict with the checkpointer recycling the segment. So I have used
> the same approach.

There's no need to change anything for logical slots? Or do you just
mean that you moved the existing code?
>
> /*
> + * Grab and save an LSN value to prevent WAL recycling past that point.
> + */
> +void
> +ReplicationSlotRegisterRestartLSN()
> +{
> + ReplicationSlot *slot = MyReplicationSlot;
> +
> + Assert(slot != NULL);
> + Assert(slot->data.restart_lsn == InvalidXLogRecPtr);
> +
> + /*
> + * The replication slot mechanism is used to prevent removal of required
> + * WAL. As there is no interlock between this and checkpoints required, WAL
> + * segment could be removed before ReplicationSlotsComputeRequiredLSN() has
> + * been called to prevent that. In the very unlikely case that this happens
> + * we'll just retry.
> + */
> + while (true)
> + {
> + XLogSegNo segno;
> +
> + /*
> + * Let's start with enough information if we can, so log a standby
> + * snapshot and start decoding at exactly that position.
> + */
> + if (!RecoveryInProgress())
> + {
> + XLogRecPtr flushptr;
> +
> + /* start at current insert position */
> + slot->data.restart_lsn = GetXLogInsertRecPtr();
> +
> + /* make sure we have enough information to start */
> + flushptr = LogStandbySnapshot();
> +
> + /* and make sure it's fsynced to disk */
> + XLogFlush(flushptr);
> + }
> + else
> + slot->data.restart_lsn = GetRedoRecPtr();
> +
> + /* prevent WAL removal as fast as possible */
> + ReplicationSlotsComputeRequiredLSN();
> +
> + /*
> + * If all required WAL is still there, great, otherwise retry. The
> + * slot should prevent further removal of WAL, unless there's a
> + * concurrent ReplicationSlotsComputeRequiredLSN() after we've written
> + * the new restart_lsn above, so normally we should never need to loop
> + * more than twice.
> + */
> + XLByteToSeg(slot->data.restart_lsn, segno);
> + if (XLogGetLastRemovedSegno() < segno)
> + break;
> + }
> +}

That doesn't look right to me. Why is this code logging a standby
snapshot for physical slots?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2015-06-10 15:12:46 Re: s_lock() seems too aggressive for machines with many sockets
Previous Message Nils Goroll 2015-06-10 15:06:19 Re: s_lock() seems too aggressive for machines with many sockets