From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Peter Smith <smithpb2250(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: Parallel heap vacuum |
Date: | 2025-04-01 21:30:08 |
Message-ID: | CAD21AoCdxc6jLfk5fc1a5-2DgxFikrjFPa6-A5b8pn27i4yKRg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Mar 31, 2025 at 2:05 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Mar 27, 2025 at 11:40 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Mar 27, 2025 at 8:55 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Mar 26, 2025 at 1:00 PM Melanie Plageman
> > > <melanieplageman(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, Mar 24, 2025 at 7:58 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > >
> > > > > You're right. I've studied the read stream code and figured out how to
> > > > > use it. In the attached patch, we end the read stream at the end of
> > > > > phase 1 and start a new read stream, as you suggested.
> > > >
> > > > I've started looking at this patch set some more.
> > >
> > > Thank you for reviewing the patch!
> > >
> > > >
> > > > In heap_vac_scan_next_block() if we are in the SKIP_PAGES_THRESHOLD
> > > > codepath and run out of unskippable blocks in the current chunk and
> > > > then go back to get another chunk (goto retry) but we are near the
> > > > memory limit so we can't get another block
> > > > (!dead_items_check_memory_limit()), could we get an infinite loop?
> > > >
> > > > Or even incorrectly encroach on another worker's block? Asking that
> > > > because of this math
> > > > end_block = next_block +
> > > >
> > > > vacrel->plvstate->scanworker->pbscanwork.phsw_chunk_remaining + 1;
> > >
> > > You're right. We should make sure that reset next_block is reset to
> > > InvalidBlockNumber at the beginning of the retry loop.
> > >
> > > >
> > > > if vacrel->plvstate->scanworker->pbscanwork.phsw_chunk_remaining is 0
> > > > and we are in the goto retry loop, it seems like we could keep
> > > > incrementing next_block even when we shouldn't be.
> > >
> > > Right. Will fix.
> > >
> > > >
> > > > I just want to make sure that the skip pages optimization works with
> > > > the parallel block assignment and the low memory read stream
> > > > wind-down.
> > > >
> > > > I also think you do not need to expose
> > > > table_block_parallelscan_skip_pages_in_chunk() in the table AM. It is
> > > > only called in heap-specific code and the logic seems very
> > > > heap-related. If table AMs want something to skip some blocks, they
> > > > could easily implement it.
> > >
> > > Agreed. Will remove it.
> > >
> > > >
> > > > On another topic, I think it would be good to have a comment above
> > > > this code in parallel_lazy_scan_gather_scan_results(), stating why we
> > > > are very sure it is correct.
> > > > Assert(TransactionIdIsValid(data->NewRelfrozenXid));
> > > > Assert(MultiXactIdIsValid(data->NewRelminMxid));
> > > >
> > > > if (TransactionIdPrecedes(data->NewRelfrozenXid,
> > > > vacrel->scan_data->NewRelfrozenXid))
> > > > vacrel->scan_data->NewRelfrozenXid = data->NewRelfrozenXid;
> > > >
> > > > if (MultiXactIdPrecedesOrEquals(data->NewRelminMxid,
> > > > vacrel->scan_data->NewRelminMxid))
> > > > vacrel->scan_data->NewRelminMxid = data->NewRelminMxid;
> > > >
> > > > if (data->nonempty_pages < vacrel->scan_data->nonempty_pages)
> > > > vacrel->scan_data->nonempty_pages = data->nonempty_pages;
> > > >
> > > > vacrel->scan_data->skippedallvis |= data->skippedallvis;
> > > >
> > > > Parallel relfrozenxid advancement sounds scary, and scary things are
> > > > best with comments. Even though the way this works is intuitive, I
> > > > think it is worth pointing out that this part is important to get
> > > > right so future programmers know how important it is.
> > > >
> > > > One thing I was wondering about is if there are any implications of
> > > > different workers having different values in their GlobalVisState.
> > > > GlobalVisState can be updated during vacuum, so even if they start out
> > > > with the same values, that could diverge. It is probably okay since it
> > > > just controls what tuples are removable. Some workers may remove fewer
> > > > tuples than they absolutely could, and this is probably okay.
> > > >
> > >
> > > Good point.
> > >
> > > > And if it is okay for each worker to have different GlobalVisState
> > > > then maybe you shouldn't have a GlobalVisState in shared memory. If
> > > > you look at GlobalVisTestFor() it just returns the memory address of
> > > > that global variable in the backend. So, it seems like it might be
> > > > better to just let each parallel worker use their own backend local
> > > > GlobalVisState and not try to put it in shared memory and copy it from
> > > > one worker to the other workers when initializing them. I'm not sure.
> > > > At the very least, there should be a comment explaining why you've
> > > > done it the way you have done it.
> > >
> > > Agreed. IIUC it's not a problem even if parallel workers use their own
> > > GlobalVisState. I'll make that change and remove the 0004 patch which
> > > exposes GlobalVisState.
> > >
> > > I'll send the updated patch soon.
> >
> > I've attached the updated patches. This version includes the comments
> > from Melanie, some bug fixes, and comment updates.
>
> Rebased the patches as they conflicted with recent commits.
>
I've attached the new version patch. There are no major changes; I
fixed some typos, improved the comment, and removed duplicated codes.
Also, I've updated the commit messages.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v16-0001-Introduces-table-AM-APIs-for-parallel-table-vacu.patch | application/octet-stream | 9.8 KB |
v16-0002-vacuumparallel.c-Support-parallel-vacuuming-for-.patch | application/octet-stream | 27.1 KB |
v16-0003-Move-lazy-heap-scan-related-variables-to-new-str.patch | application/octet-stream | 31.2 KB |
v16-0004-Support-parallelism-for-collecting-dead-items-du.patch | application/octet-stream | 58.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2025-04-01 21:34:19 | Re: TEMP_CONFIG vs test_aio |
Previous Message | Corey Huinker | 2025-04-01 21:24:57 | Re: Add partial :-variable expansion to psql \copy |