From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: A failure in t/038_save_logical_slots_shutdown.pl |
Date: | 2024-01-13 11:12:55 |
Message-ID: | CAA4eK1+F5P9CEVNquD-fFd0-sxEkOPBGigM5iEcs8d8Yv+2HUg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jan 12, 2024 at 3:36 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Fri, Jan 12, 2024 at 9:28 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, Jan 11, 2024 at 10:03 PM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Jan 11, 2024 at 4:35 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > >
> > > > On further analysis, it was found that in the failing test,
> > > > CHECKPOINT_SHUTDOWN was started in a new page, so there was the WAL
> > > > page header present just before the CHECKPOINT_SHUTDOWN which was
> > > > causing the failure. We could alternatively reproduce the issue by
> > > > switching the WAL file before restarting the server like in the
> > > > attached test change patch.
> > > > There are a couple of ways to fix this issue a) one by switching the
> > > > WAL before the insertion of records so that the CHECKPOINT_SHUTDOWN
> > > > does not get inserted in a new page as in the attached test_fix.patch
> > > > b) by using pg_walinspect to check that the next WAL record is
> > > > CHECKPOINT_SHUTDOWN. I have to try this approach.
> > > >
> > > > Thanks to Bharath and Kuroda-san for offline discussions and helping
> > > > in getting to the root cause.
> > >
> > > IIUC, the problem the commit e0b2eed tries to solve is to ensure there
> > > are no left-over decodable WAL records between confirmed_flush LSN and
> > > a shutdown checkpoint, which is what it is expected from the
> > > t/038_save_logical_slots_shutdown.pl. How about we have a PG function
> > > returning true if there are any decodable WAL records between the
> > > given start_lsn and end_lsn?
> > >
> >
> > But, we already test this in 003_logical_slot during a successful
> > upgrade. Having an explicit test to do the same thing has some merits
> > but not sure if it is worth it.
>
> If the code added by commit e0b2eed is covered by the new upgrade
> test, why not remove 038_save_logical_slots_shutdown.pl altogether?
>
This is a more strict check because it is possible that even if the
latest confirmed_flush location is not persisted there is no
meaningful decodable WAL between whatever the last confirmed_flush
location saved on disk and the shutdown_checkpoint record.
Kuroda-San/Vignesh, do you have any suggestion on this one?
> > The current test tries to ensure that
> > during shutdown after we shutdown walsender and ensures that it sends
> > all the wal records and receipts an ack for the same, there is no
> > other WAL except shutdown_checkpoint. Vignesh's suggestion (a) makes
> > the test robust enough that it shouldn't show spurious failures like
> > the current one reported by you, so shall we try to proceed with that?
>
> Do you mean something like [1]? It ensures the test passes unless any
> writes are added (in future) before the publisher restarts in the test
> which can again make the tests flaky. How do we ensure no one adds
> anything in before the publisher restart
> 038_save_logical_slots_shutdown.pl? A note before the restart perhaps?
>
I am fine with adding the note.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2024-01-13 11:21:00 | Re: Invalidate the subscription worker in cases where a user loses their superuser status |
Previous Message | Amit Kapila | 2024-01-13 11:02:47 | Re: speed up a logical replica setup |