Re: Parallel heap vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Parallel heap vacuum
Date: 2025-03-28 03:55:22
Message-ID: CAD21AoBRskqtyhfZYgrw4ZJ7Va05bF+5zJu=hD9X4Pz+bDjqFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 26, 2025 at 1:00 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Mon, Mar 24, 2025 at 7:58 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > You're right. I've studied the read stream code and figured out how to
> > use it. In the attached patch, we end the read stream at the end of
> > phase 1 and start a new read stream, as you suggested.
>
> I've started looking at this patch set some more.

Thank you for reviewing the patch!

>
> In heap_vac_scan_next_block() if we are in the SKIP_PAGES_THRESHOLD
> codepath and run out of unskippable blocks in the current chunk and
> then go back to get another chunk (goto retry) but we are near the
> memory limit so we can't get another block
> (!dead_items_check_memory_limit()), could we get an infinite loop?
>
> Or even incorrectly encroach on another worker's block? Asking that
> because of this math
> end_block = next_block +
>
> vacrel->plvstate->scanworker->pbscanwork.phsw_chunk_remaining + 1;

You're right. We should make sure that reset next_block is reset to
InvalidBlockNumber at the beginning of the retry loop.

>
> if vacrel->plvstate->scanworker->pbscanwork.phsw_chunk_remaining is 0
> and we are in the goto retry loop, it seems like we could keep
> incrementing next_block even when we shouldn't be.

Right. Will fix.

>
> I just want to make sure that the skip pages optimization works with
> the parallel block assignment and the low memory read stream
> wind-down.
>
> I also think you do not need to expose
> table_block_parallelscan_skip_pages_in_chunk() in the table AM. It is
> only called in heap-specific code and the logic seems very
> heap-related. If table AMs want something to skip some blocks, they
> could easily implement it.

Agreed. Will remove it.

>
> On another topic, I think it would be good to have a comment above
> this code in parallel_lazy_scan_gather_scan_results(), stating why we
> are very sure it is correct.
> Assert(TransactionIdIsValid(data->NewRelfrozenXid));
> Assert(MultiXactIdIsValid(data->NewRelminMxid));
>
> if (TransactionIdPrecedes(data->NewRelfrozenXid,
> vacrel->scan_data->NewRelfrozenXid))
> vacrel->scan_data->NewRelfrozenXid = data->NewRelfrozenXid;
>
> if (MultiXactIdPrecedesOrEquals(data->NewRelminMxid,
> vacrel->scan_data->NewRelminMxid))
> vacrel->scan_data->NewRelminMxid = data->NewRelminMxid;
>
> if (data->nonempty_pages < vacrel->scan_data->nonempty_pages)
> vacrel->scan_data->nonempty_pages = data->nonempty_pages;
>
> vacrel->scan_data->skippedallvis |= data->skippedallvis;
>
> Parallel relfrozenxid advancement sounds scary, and scary things are
> best with comments. Even though the way this works is intuitive, I
> think it is worth pointing out that this part is important to get
> right so future programmers know how important it is.
>
> One thing I was wondering about is if there are any implications of
> different workers having different values in their GlobalVisState.
> GlobalVisState can be updated during vacuum, so even if they start out
> with the same values, that could diverge. It is probably okay since it
> just controls what tuples are removable. Some workers may remove fewer
> tuples than they absolutely could, and this is probably okay.
>

Good point.

> And if it is okay for each worker to have different GlobalVisState
> then maybe you shouldn't have a GlobalVisState in shared memory. If
> you look at GlobalVisTestFor() it just returns the memory address of
> that global variable in the backend. So, it seems like it might be
> better to just let each parallel worker use their own backend local
> GlobalVisState and not try to put it in shared memory and copy it from
> one worker to the other workers when initializing them. I'm not sure.
> At the very least, there should be a comment explaining why you've
> done it the way you have done it.

Agreed. IIUC it's not a problem even if parallel workers use their own
GlobalVisState. I'll make that change and remove the 0004 patch which
exposes GlobalVisState.

I'll send the updated patch soon.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anton A. Melnikov 2025-03-28 04:09:07 Re: Use XLOG_CONTROL_FILE macro everywhere?
Previous Message Amit Kapila 2025-03-28 03:53:54 Re: Selectively invalidate caches in pgoutput module