Re: PANIC: wrong buffer passed to visibilitymap_clear

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PANIC: wrong buffer passed to visibilitymap_clear
Date: 2021-04-12 18:03:13
Message-ID: CAH2-WznPo0D2t9fBgK5jKjprdLxsJvy_PnnFxUeF2ftFxXstsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 12, 2021 at 9:19 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> So I think we have to stick with the current basic design, and just
> tighten things up to make sure that visibility pins are accounted for
> in the places that are missing it.
>
> Hence, I propose the attached. It passes check-world, but that proves
> absolutely nothing of course :-(. I wonder if there is any way to
> exercise these code paths deterministically.

This approach seems reasonable to me. At least you've managed to
structure the visibility map page pin check as concomitant with the
existing space recheck.

> (I have realized BTW that I was exceedingly fortunate to reproduce
> the buildfarm report here --- I have run hundreds of additional
> cycles of the same test scenario without getting a second failure.)

In the past I've had luck with RR's chaos mode (most notably with the
Jepsen SSI bug). That didn't work for me here, though I might just
have not persisted with it for long enough. I should probably come up
with a shell script that runs the same thing hundreds of times or more
in chaos mode, while making sure that useless recordings don't
accumulate.

The feature is described here:

https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mode.html

You only have to be lucky once. Once that happens, you're left with a
recording to review and re-review at your leisure. This includes all
Postgres backends, maybe even pg_regress and other scaffolding (if
that's what you're after).

But that's for debugging, not testing. The only way that we'll ever be
able to test stuff like this is with something like Alexander
Korotkov's stop events patch [1]. That infrastructure should be added
sooner rather than later.

[1] https://postgr.es/m/CAPpHfdtSEOHX8dSk9Qp+Z++i4BGQoffKip6JDWngEA+g7Z-XmQ@mail.gmail.com
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-04-12 18:23:15 Re: [PATCH] Identify LWLocks in tracepoints
Previous Message Nandni Mehla 2021-04-12 17:56:48 Proposal for working on open source with PostgreSQL