Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Date: 2022-10-17 17:34:02
Message-ID: CA+TgmoYLiS_uxHx77PSXDN9SQx004GPCXT7iB4WVO8LeRiDw+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 17, 2022 at 1:02 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> That's true in general, but the case of fixing a bug in one place but not in
> another nearby is a different story.

I agree, but I still think we shouldn't let the perfect be the enemy
of the good.

> > That code indeed seems stupid, because as you say, there's no point in
> > calling _hash_getnewbuf() and thus overwriting the buffer and then only
> > afterwards checking IsBufferCleanupOK. By then the die is cast. But at the
> > same time, I think that it's not wrong in any way that matters to the best
> > of our current knowledge. That new buffer that we just went and got might be
> > pinned by somebody else, but they can't be doing anything interesting with
> > it, because we wouldn't be allocating it as a new page if it were already in
> > use for anything, and only one process is allowed to be doing such an
> > allocation at a time. That both means that we can likely remove the cleanup
> > lock acquisition, but it also means that if we don't, there is no
> > correctness problem here, strictly speaking.
>
> If that's the case cool - I just don't know the locking protocol of hash
> indexes well enough to judge this.

Darn, I was hoping you did, because I think this could certainly use
more than one pair of educated eyes.

> > So I would suggest that if we feel we absolutely must put a comment
> > here, we could make it say something like "XXX. It doesn't make sense
> > to call _hash_getnewbuf() first, zeroing the buffer, and then only
> > afterwards check whether we have a cleanup lock. However, since no
> > scan can be accessing the new buffer yet, any concurrent accesses will
> > just be from processes like the bgwriter or checkpointer which don't
> > care about its contents, so it doesn't really matter."
>
> WFM. I'd probably lean to just fixing in the backbranches instead, but as long
> as we make a conscious decision...

I am reasonably confident that just making the cleanup lock
acquisition unconditional will not break anything that isn't broken
already. Perhaps that confidence will turn out to be misplaced, but at
the moment I just don't see what can go wrong. Since it's a standby,
nobody else should be trying to get a cleanup lock, and even if they
did, err, so what? We can't deadlock because we don't hold any other
locks.

I don't feel quite as confident that not attempting a cleanup lock on
the new bucket's primary page is OK. I think it should be fine. The
existing comment even says it should be fine. But, that comment could
be wrong, and I'm not sure that I have my head around what all of the
possible interactions around that cleanup lock are. So changing it
makes me a little nervous.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2022-10-17 17:56:02 Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Previous Message Andres Freund 2022-10-17 17:02:10 Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock