Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, exclusion(at)gmail(dot)com
Subject: Re: Fix race condition in InvalidatePossiblyObsoleteSlot()
Date: 2024-02-19 07:47:23
Message-ID: ZdMHi/yKAd+fJBjd@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Feb 19, 2024 at 03:14:07PM +0900, Michael Paquier wrote:
> On Thu, Feb 15, 2024 at 11:24:59AM +0000, Bertrand Drouvot wrote:
> > Thanks for looking at it!
> >
> > Agree, using an assertion instead in v3 attached.
>
> And you did not forget the PG_USED_FOR_ASSERTS_ONLY.

Thanks to the "CompilerWarnings" cirrus test ;-)

>
> > > It seems important to document why we are saving this data here; while
> > > we hold the LWLock for repslots, before we perform any termination,
> > > and we want to protect conflict reports with incorrect data if the
> > > slot data got changed while the lwlock is temporarily released while
> > > there's a termination.
> >
> > Yeah, comments added in v3.
>
> The contents look rather OK, I may do some word-smithing for both.

Thanks!

> >> For example just after the kill()? It seems to me
> >> that we should stuck the checkpointer, perform a forced upgrade of the
> >> xmins, release the checkpointer and see the effects of the change in
> >> the second loop. Now, modules/injection_points/ does not allow that,
> >> yet, but I've posted a patch among these lines that can stop a process
> >> and release it using a condition variable (should be 0006 on [1]). I
> >> was planning to start a new thread with a patch posted for the next CF
> >> to add this kind of facility with a regression test for an old bug,
> >> the patch needs a few hours of love, first. I should be able to post
> >> that next week.
> >
> > Great, that looks like a good idea!
>
> I've been finally able to spend some time on what I had in mind and
> posted it here for the next CF:
> https://www.postgresql.org/message-id/ZdLuxBk5hGpol91B@paquier.xyz
>
> You should be able to reuse that the same way I do in 0002 posted on
> the thread, where I'm causing the checkpointer to wait, then wake it
> up.

Thanks! I'll look at it.

> > Are we going to fix this on master and 16 stable first and then later on add a
> > test on master with the injection points?
>
> Still, the other patch is likely going to take a couple of weeks
> before getting into the tree, so I have no objection to fix the bug
> first and backpatch, then introduce a test. Things have proved that
> failures could show up in the buildfarm in this area, so more time
> running this code across two branches is not a bad concept, either.

Fully agree.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-02-19 07:51:45 Re: Injection points: some tools to wait and wake
Previous Message Nazir Bilal Yavuz 2024-02-19 07:28:05 Re: Show WAL write and fsync stats in pg_stat_io