Re: Parallel heap vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, John Naylor <johncnaylorls(at)gmail(dot)com>
Subject: Re: Parallel heap vacuum
Date: 2025-02-20 01:31:32
Message-ID: CAD21AoAr+Jck6MAeYhTN50youuL=+z6Lt8a52Hrn_GuNRTDMPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 18, 2025 at 4:43 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Mon, Feb 17, 2025 at 1:11 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Feb 14, 2025 at 2:21 PM Melanie Plageman
> > <melanieplageman(at)gmail(dot)com> wrote:
> > >
> > > Since the failure rate is defined as a percent, couldn't we just have
> > > parallel workers set eager_scan_remaining_fails when they get their
> > > chunk assignment (as a percentage of their chunk size)? (I haven't
> > > looked at the code, so maybe this doesn't make sense).
> >
> > IIUC since the chunk size eventually becomes 1, we cannot simply just
> > have parallel workers set the failure rate to its assigned chunk.
>
> Yep. The ranges are too big (1-8192). The behavior would be too
> different from serial.
>
> > > Also, if you start with only doing parallelism for the third phase of
> > > heap vacuuming (second pass over the heap), this wouldn't be a problem
> > > because eager scanning only impacts the first phase.
> >
> > Right. I'm inclined to support only the second heap pass as the first
> > step. If we support parallelism only for the second pass, it cannot
> > help speed up freezing the entire table in emergency situations, but
> > it would be beneficial for cases where a big table have a large amount
> > of spread garbage.
> >
> > At least, I'm going to reorganize the patch set to support parallelism
> > for the second pass first and then the first heap pass.
>
> Makes sense.

I've attached the updated patches. In this version, I focused on
parallelizing only the second pass over the heap. It's more
straightforward than supporting the first pass, it still requires many
preliminary changes though.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v9-0003-radixtree.h-support-shared-iteration.patch application/octet-stream 16.0 KB
v9-0002-vacuumparallel.c-Support-parallel-table-vacuuming.patch application/octet-stream 21.4 KB
v9-0004-tidstore.c-support-shared-iteration-on-TidStore.patch application/octet-stream 7.1 KB
v9-0005-Move-some-fields-of-LVRelState-to-LVVacCounters-s.patch application/octet-stream 7.9 KB
v9-0006-Support-parallelism-for-removing-dead-items-durin.patch application/octet-stream 18.7 KB
v9-0001-Introduces-table-AM-APIs-for-parallel-table-vacuu.patch application/octet-stream 5.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-02-20 01:45:55 Re: Fix logging for invalid recovery timeline
Previous Message Sami Imseih 2025-02-20 01:04:41 Re: Sample rate added to pg_stat_statements