From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel heap vacuum |
Date: | 2025-03-10 23:29:27 |
Message-ID: | CAD21AoBDMpcquooJQXrZ7Ui-m+mZmnRnM4p3qdN8As4Q5GsZDQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Mar 9, 2025 at 11:28 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Mar 7, 2025 at 11:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Discussing with Amit offlist, I've run another benchmark test where no
> > data is loaded on the shared buffer. In the previous test, I loaded
> > all table blocks before running vacuum, so it was the best case. The
> > attached test results showed the worst case.
> >
> > Overall, while the numbers seem not stable, the phase I got sped up a
> > bit, but not as scalable as expected, which is not surprising.
> >
>
> Sorry, but it is difficult for me to understand this data because it
> doesn't contain the schema or details like what exactly is a fraction.
> It is also not clear how the workers are divided among heap and
> indexes, like do we use parallelism for both phases of heap or only
> first phase and do we reuse those workers for index vacuuming. These
> tests were probably discussed earlier, but it would be better to
> either add a summary of the required information to understand the
> results or at least a link to a previous email that has such details.
The testing configurations are:
max_wal_size = 50GB
shared_buffers = 25GB
max_parallel_maintenance_workers = 10
max_parallel_workers = 20
max_worker_processes = 30
The test scripts are: ($m and $p are a fraction and a parallel degree,
respectively)
create unlogged table test_vacuum (a bigint) with (autovacuum_enabled=off);
insert into test_vacuum select i from generate_series(1,200000000) s(i);
create index idx_0 on test_vacuum (a);
create index idx_1 on test_vacuum (a);
create index idx_2 on test_vacuum (a);
create index idx_3 on test_vacuum (a);
create index idx_4 on test_vacuum (a);
delete from test_vacuum where mod(a, $m) = 0;
vacuum (verbose, parallel $p) test_vacuum; -- measured the execution time
>
> Please
> > note that the test results shows that the phase III also got sped up
> > but this is because in parallel vacuum we use more ring buffers than
> > the single process vacuum. So we need to compare the only phase I time
> > in terms of the benefit of the parallelism.
> >
>
> Does phase 3 also use parallelism? If so, can we try to divide the
> ring buffers among workers or at least try vacuum with an increased
> number of ring buffers. This would be good to do for both the phases,
> if they both use parallelism.
No, only phase 1 was parallelized in this test. In parallel vacuum,
since it uses (ring_buffer_size * parallel_degree) memory, more pages
are loaded during phase 1, increasing cache hits during phase 3.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Melanie Plageman | 2025-03-10 23:45:38 | Re: BitmapHeapScan streaming read user and prelim refactoring |
Previous Message | Przemysław Sztoch | 2025-03-10 23:01:26 | Re: encode/decode support for base64url |