Re: persist logical slots to disk during shutdown checkpoint

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: persist logical slots to disk during shutdown checkpoint
Date: 2023-08-31 06:39:53
Message-ID: CAA4eK1Lx-XMd9-1KDwhZ=GzxTS8DLqbn8V8GYxLjvuLTq1W+fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 30, 2023 at 6:33 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Tue, Aug 29, 2023 at 5:40 PM Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> >
> > I am looking at it. If you can wait till the end of the week, that
> > will be great.
>
> /*
> * Successfully wrote, unset dirty bit, unless somebody dirtied again
> - * already.
> + * already and remember the confirmed_flush LSN value.
> */
> SpinLockAcquire(&slot->mutex);
> if (!slot->just_dirtied)
> slot->dirty = false;
> + slot->last_saved_confirmed_flush = slot->data.confirmed_flush;
>
> If the slot->data.confirmed_flush gets updated concurrently between copying it
> to be written to the disk and when it's written to last_saved_confirmed_flush,
> we will miss one update. I think we need to update last_saved_confirmed_flush
> based on the cp.slotdata.confirmed_flush rather than
> slot->data.confirmed_flush.
>

Yeah, this appears to be a problem.

> We are setting last_saved_confirmed_flush for all types of slots but using it
> only when the slot is logical. Should we also set it only for logical slots?
>

We can do that but not sure if there is any advantage of it other than
adding extra condition. BTW, won't even confirmed_flush LSN be used
only for logical slots?

> /* first check whether there's something to write out */
> SpinLockAcquire(&slot->mutex);
> was_dirty = slot->dirty;
> slot->just_dirtied = false;
> + confirmed_flush_has_changed = (slot->data.confirmed_flush !=
> slot->last_saved_confirmed_flush);
>
> The confirmed_flush LSN should always move forward, otherwise there may not be
> enough WAL retained for the slot to work. I am wondering whether we should take
> an opportunity to make sure
> Assert(slot->data.confirmed_flush <= slot->last_saved_confirmed_flush)
>

Theoretically, what you are saying makes sense to me but we don't have
such a protection while updating the confirmed_flush LSN. It would be
better to first add such a protection for confirmed_flush LSN update
as a separate patch.

> - /* and don't do anything if there's nothing to write */
> - if (!was_dirty)
> + /* Don't do anything if there's nothing to write. See ReplicationSlot. */
> + if (!was_dirty &&
> + !(is_shutdown && SlotIsLogical(slot) && confirmed_flush_has_changed))
>
> Rather than complicating this condition, I wonder whether it's better to just
> set was_dirty = true when is_shutdown && SlotIsLogical(slot) &&
> confirmed_flush_has_changed) or even slot->dirty = true.
>

I think it is better to keep the slot's dirty property separate. But
we can introduce another variable that can be the result of both
was_dirty and other checks together, however, that doesn't seem much
better than the current code.

> +
> + /*
> + * We won't ensure that the slot is persisted after the confirmed_flush
> + * LSN is updated as that could lead to frequent writes. However, we need
> + * to ensure that we do persist the slots at the time of shutdown whose
> + * confirmed_flush LSN is changed since we last saved the slot to disk.
> + * This will help in avoiding retreat of the confirmed_flush LSN after
> + * restart. This variable is used to track the last saved confirmed_flush
> + * LSN value.
> + */
>
> This comment makes more sense in SaveSlotToPath() than here. We may decide to
> use last_saved_confirmed_flush for something else in future.
>

I have kept it here because it contains some information that is not
specific to SaveSlotToPath. So, it seems easier to follow the whole
theory if we keep it at the central place in the structure and then
add the reference wherever required but I am fine if you and others
feel strongly about moving this to SaveSlotToPath().

> +
> +sub compare_confirmed_flush
> +{
> + my ($node, $confirmed_flush_from_log) = @_;
> +
> + # Fetch Latest checkpoint location from the control file
> + my ($stdout, $stderr) =
> + run_command([ 'pg_controldata', $node->data_dir ]);
> + my @control_data = split("\n", $stdout);
> + my $latest_checkpoint = undef;
> + foreach (@control_data)
> + {
> + if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
> + {
> + $latest_checkpoint = $1;
> + last;
> + }
> + }
> + die "Latest checkpoint location not found in control file\n"
> + unless defined($latest_checkpoint);
> +
> + # Is it same as the value read from log?
> + ok( $latest_checkpoint eq $confirmed_flush_from_log,
> + "Check that the slot's confirmed_flush LSN is the same as the
> latest_checkpoint location"
> + );
>
> This function assumes that the subscriber will receive and confirm WAL upto
> checkpoint location and publisher's WAL sender will update it in the slot.
> Where is the code to ensure that? Does the WAL sender process wait for
> checkpoint
> LSN to be confirmed when shutting down?
>

Note, that we need to compare if all the WAL before the
shutdown_checkpoint WAL record is sent. Before the clean shutdown, we
do ensure that all the pending WAL is confirmed back. See the use of
WalSndDone() in WalSndLoop().

> +
> +# Restart the publisher to ensure that the slot will be persisted if required
> +$node_publisher->restart();
>
> Can we add this test comparing LSNs after publisher restart, to an existing
> test itself - like basic replication. That's the only extra thing that this
> test does beyond usual replication stuff.
>

As this is a test after the restart of the server, I thought to keep
it with recovery tests. However, I think once the upgrade (of
publisher nodes) patch is ready, we should keep this test with those
tests or somehow merge it with those tests but till that patch is
ready, let's keep this as a separate test.

> +
> +# Wait until the walsender creates decoding context
> +$node_publisher->wait_for_log(
> + qr/Streaming transactions committing after
> ([A-F0-9]+\/[A-F0-9]+), reading WAL from ([A-F0-9]+\/[A-F0-9]+)./,
> + $offset);
> +
> +# Extract confirmed_flush from the logfile
> +my $log_contents = slurp_file($node_publisher->logfile, $offset);
> +$log_contents =~
> + qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+),
> reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
> + or die "could not get confirmed_flush_lsn";
>
> Why are we figuring out the LSN from the log file? Is it not available from
> pg_replication_slots view? If we do so, I think this test will fail is the slot
> gets written after the restart because of concurrent activity on the publisher
> (like autovacuum, or other things that cause empty transaction to be
> replicated) and subscriber. A very rare chance but not 0 probability one.
>

Yes, that is a possibility.

> I
> think we should shut down subscriber, restart publisher and then make this
> check based on the contents of the replication slot instead of server log.
> Shutting down subscriber will ensure that the subscriber won't send any new
> confirmed flush location to the publisher after restart.
>

But if we shutdown the subscriber before the publisher there is no
guarantee that the publisher has sent all outstanding logs up to the
shutdown checkpoint record (i.e., the latest record). Such a guarantee
can only be there if we do a clean shutdown of the publisher before
the subscriber.

> All the places which call ReplicationSlotSave() mark the slot as dirty. All
> the places where SaveSlotToPath() is called, the slot is marked dirty except
> when calling from CheckPointReplicationSlots(). So I am wondering whether we
> should be marking the slot dirty in CheckPointReplicationSlots() and avoid
> passing down is_shutdown flag to SaveSlotToPath().
>

I feel that will add another spinlock acquire/release pair without
much benefit. Sure, it may not be performance-sensitive but still
adding another pair of lock/release doesn't seem like a better idea.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2023-08-31 06:47:31 Statistics Import and Export
Previous Message Michael Paquier 2023-08-31 05:33:56 Re: [PATCH] Add native windows on arm64 support