Re: PANIC: wrong buffer passed to visibilitymap_clear

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PANIC: wrong buffer passed to visibilitymap_clear
Date: 2021-04-12 20:40:46
Message-ID: 20210412204046.p3mh7wwfs44f2qid@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-04-11 13:55:30 -0400, Tom Lane wrote:
> Either way, it's hard to argue that heap_update hasn't crossed the
> complexity threshold where it's impossible to maintain safely. We
> need to simplify it.

Yea, I think we're well beyond that point. I can see a few possible
steps to wrangle the existing complexity into an easier to understand
shape:

- Rename heapam.c goto labels, they're useless to understand what is
happening.

- Move HeapTupleSatisfiesUpdate() call and the related branches
afterwards into its own function.

- Move "temporarily mark it locked" branch into its own function. It's a
minimal implementation of tuple locking, so it seems more than
separate enough.

- Move the "store the new tuple" part into its own function (pretty much
the critical section).

- Probably worth unifying the exit paths - there's a fair bit of
duplication by now...

Half related:

- I think we might also need to do something about the proliferation of
bitmaps in heap_update(). We now separately allocate 5 bitmapsets -
that strikes me as fairly insane.

However, these would not really address the complexity in itself, just
make it easier to manage.

ISTM that a lot of the complexity is related to needing to retry (and
avoiding doing so unnecessarily), which in turn is related to avoiding
deadlocks. We actually know how to not need that to the same degree -
the (need_toast || newtupsize > pagefree) locks the tuple and afterwards
has a lot more freedom. We obviously can't just always do that, due to
the WAL logging overhead.

I wonder if we could make that path avoid the WAL logging overhead. We
don't actually need a full blown tuple lock, potentially even with its
own multixact, here.

The relevant comment (in heap_lock_tuple()) says:
/*
* XLOG stuff. You might think that we don't need an XLOG record because
* there is no state change worth restoring after a crash. You would be
* wrong however: we have just written either a TransactionId or a
* MultiXactId that may never have been seen on disk before, and we need
* to make sure that there are XLOG entries covering those ID numbers.
* Else the same IDs might be re-used after a crash, which would be
* disastrous if this page made it to disk before the crash. Essentially
* we have to enforce the WAL log-before-data rule even in this case.
* (Also, in a PITR log-shipping or 2PC environment, we have to have XLOG
* entries for everything anyway.)
*/

But I don't really think that doing full-blown WAL tuple-locking WAL
logging is really the right solution.

- The "next xid" concerns are at least as easily solved by WAL logging a
distinct "highest xid assigned" record when necessary. Either by
having a shared memory variable saying "latestLoggedXid" or such, or
by having end-of-recovery advance nextXid to beyond what ExtendCLOG()
extended to. That reduces the overhead to at most once-per-xact (and
commonly smaller) or nothing, respectively.

- While there's obviously a good bit of simplicity ensuring that a
replica is exactly the same ("Also, in a PITR log-shipping or 2PC
environment ..."), we don't actually enforce that strictly anyway -
so I am not sure why it's necessary to pay the price here.

But maybe I'm all wet here, I certainly haven't had enough coffee.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-04-12 20:42:03 Re: Allowing to create LEAKPROOF functions to non-superuser
Previous Message Tom Lane 2021-04-12 20:37:01 Re: Allowing to create LEAKPROOF functions to non-superuser