From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock |
Date: | 2022-08-17 00:57:40 |
Message-ID: | 20220817005740.bgzxg56plejdmxsw@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2022-08-16 19:55:18 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > What sort of random things would someone do with pageinspect functions
> > that would hold buffer pins on one buffer while locking another one?
> > The functions in hashfuncs.c don't even seem like they would access
> > multiple buffers in total, let alone at overlapping times. And I don't
> > think that a query pageinspect could realistically be suspended while
> > holding a buffer pin either. If you wrapped it in a cursor it'd be
> > suspended before or after accessing any given buffer, not right in the
> > middle of that operation.
>
> pin != access. Unless things have changed really drastically since
> I last looked, a seqscan will sit on a buffer pin throughout the
> series of fetches from a single page.
That's still the case. But for heap that shouldn't be a problem, because we'll
never try to take a cleanup lock while holding another page locked (nor even
another heap page pinned, I think).
I find it *highly* suspect that hash needs to acquire a cleanup lock while
holding another buffer locked. The recovery aspect alone makes that seem quite
unwise. Even if there's possibly no deadlock here for some reason or another.
Looking at the non-recovery code makes me even more suspicious:
/*
* Physically allocate the new bucket's primary page. We want to do this
* before changing the metapage's mapping info, in case we can't get the
* disk space. Ideally, we don't need to check for cleanup lock on new
* bucket as no other backend could find this bucket unless meta page is
* updated. However, it is good to be consistent with old bucket locking.
*/
buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
if (!IsBufferCleanupOK(buf_nblkno))
{
_hash_relbuf(rel, buf_oblkno);
_hash_relbuf(rel, buf_nblkno);
goto fail;
}
_hash_getnewbuf() calls _hash_pageinit() which calls PageInit(), which
memset(0)s the whole page. What does it even mean to check whether you
effectively have a cleanup lock after you zeroed out the page?
Reading the README and the comment above makes me wonder if this whole cleanup
lock business here is just cargo culting and could be dropped?
> Admittedly, that's about *heap* page pins while indexscans have
> different rules. But I recall that btrees at least use persistent
> pins as well.
I think that's been changed, although not in an unproblematic way.
> Having said that, you're right that this is qualitatively different
> from the other bug, in that this is a deadlock not apparent data
> corruption. However, IIUC it's an LWLock deadlock, which we don't
> handle all that nicely.
Theoretically the startup side could be interrupted. Except that we don't
accept the startup process dying...
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2022-08-17 01:09:44 | Re: Propose a new function - list_is_empty |
Previous Message | Andres Freund | 2022-08-17 00:38:31 | Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock |