Quick Links

Re: Parallel heap vacuum

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Melanie Plageman <melanieplageman(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallel heap vacuum
Date:	2025-03-10 23:29:27
Message-ID:	CAD21AoBDMpcquooJQXrZ7Ui-m+mZmnRnM4p3qdN8As4Q5GsZDQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Mar 9, 2025 at 11:28 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Mar 7, 2025 at 11:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Discussing with Amit offlist, I've run another benchmark test where no
> > data is loaded on the shared buffer. In the previous test, I loaded
> > all table blocks before running vacuum, so it was the best case. The
> > attached test results showed the worst case.
> >
> > Overall, while the numbers seem not stable, the phase I got sped up a
> > bit, but not as scalable as expected, which is not surprising.
> >
>
> Sorry, but it is difficult for me to understand this data because it
> doesn't contain the schema or details like what exactly is a fraction.
> It is also not clear how the workers are divided among heap and
> indexes, like do we use parallelism for both phases of heap or only
> first phase and do we reuse those workers for index vacuuming. These
> tests were probably discussed earlier, but it would be better to
> either add a summary of the required information to understand the
> results or at least a link to a previous email that has such details.

The testing configurations are:

max_wal_size = 50GB
shared_buffers = 25GB
max_parallel_maintenance_workers = 10
max_parallel_workers = 20
max_worker_processes = 30

The test scripts are: ($m and $p are a fraction and a parallel degree,
respectively)

create unlogged table test_vacuum (a bigint) with (autovacuum_enabled=off);
insert into test_vacuum select i from generate_series(1,200000000) s(i);
create index idx_0 on test_vacuum (a);
create index idx_1 on test_vacuum (a);
create index idx_2 on test_vacuum (a);
create index idx_3 on test_vacuum (a);
create index idx_4 on test_vacuum (a);
delete from test_vacuum where mod(a, $m) = 0;
vacuum (verbose, parallel $p) test_vacuum; -- measured the execution time

>
> Please
> > note that the test results shows that the phase III also got sped up
> > but this is because in parallel vacuum we use more ring buffers than
> > the single process vacuum. So we need to compare the only phase I time
> > in terms of the benefit of the parallelism.
> >
>
> Does phase 3 also use parallelism? If so, can we try to divide the
> ring buffers among workers or at least try vacuum with an increased
> number of ring buffers. This would be good to do for both the phases,
> if they both use parallelism.

No, only phase 1 was parallelized in this test. In parallel vacuum,
since it uses (ring_buffer_size * parallel_degree) memory, more pages
are loaded during phase 1, increasing cache hits during phase 3.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Re: Parallel heap vacuum at 2025-03-10 06:28:38 from Amit Kapila

Responses

Re: Parallel heap vacuum at 2025-03-11 12:51:07 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Melanie Plageman	2025-03-10 23:45:38	Re: BitmapHeapScan streaming read user and prelim refactoring
Previous Message	Przemysław Sztoch	2025-03-10 23:01:26	Re: encode/decode support for base64url