Re: Add two missing tests in 035_standby_logical_decoding.pl

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add two missing tests in 035_standby_logical_decoding.pl
Date: 2023-05-02 06:28:06
Message-ID: CAA4eK1LEQfC-ZxOkN4=cuAxB1N=2iLcEEb+Mb=XGCqGhwvMkaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 28, 2023 at 2:24 PM Drouvot, Bertrand
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> > Can you
> > please explain the logic behind this test a bit more like how the WAL
> > file switch helps you to achieve the purpose?
> >
>
> The idea was to generate enough "wal switch" on the primary to ensure
> the WAL file has been removed.
>
> I gave another thought on it and I think we can skip the test that the WAL is
> not on the primary any more. That way, one "wal switch" seems to be enough
> to see it removed on the standby.
>
> It's done in V7.
>
> V7 is not doing "extra tests" than necessary and I think it's probably better like this.
>
> I can see V7 failing on "Cirrus CI / macOS - Ventura - Meson" only (other machines are not complaining).
>
> It does fail on "invalidated logical slots do not lead to retaining WAL", see https://cirrus-ci.com/task/4518083541336064
>
> I'm not sure why it is failing, any idea?
>

I think the reason for the failure is that on standby, the test is not
able to remove the file corresponding to the invalid slot. You are
using pg_switch_wal() to generate a switch record and I think you need
one more WAL-generating statement after that to achieve your purpose
which is that during checkpoint, the tes removes the WAL file
corresponding to an invalid slot. Just doing checkpoint on primary may
not serve the need as that doesn't lead to any new insertion of WAL on
standby. Is your v6 failing in the same environment? If not, then it
is probably due to the reason that the test is doing insert after
pg_switch_wal() in that version. Why did you change the order of
insert in v7?

BTW, you can confirm the failure by changing the DEBUG2 message in
RemoveOldXlogFiles() to LOG. In the case, where the test fails, it may
not remove the WAL file corresponding to an invalid slot whereas it
will remove the WAL file when the test succeeds.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Pyhalov 2023-05-02 07:03:15 Re: buffer refcount leak in foreign batch insert code
Previous Message Yurii Rashkovskii 2023-05-02 06:15:39 Re: [PATCH] Support % wildcard in extension upgrade filenames