From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Peter Smith <smithpb2250(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel heap vacuum |
Date: | 2025-03-22 20:00:25 |
Message-ID: | 6h7yzrwqo2wxwvk2fajqw7yneg6qasrkiqyhm3wdfr3uzc2fzq@ixenqjp7oehs |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-03-20 01:35:42 -0700, Masahiko Sawada wrote:
> One plausible solution would be that we don't use ReadStream in
> parallel heap vacuum cases but directly use
> table_block_parallelscan_xxx() instead. It works but we end up having
> two different scan methods for parallel and non-parallel lazy heap
> scan. I've implemented this idea in the attached v12 patches.
I think that's a bad idea - this means we'll never be able to use direct IO
for parallel VACUUMs, despite
a) The CPU overhead of buffered reads being a problem for VACUUM
b) Data ending up in the kernel page cache is rather wasteful for VACUUM, as
that's often data that won't otherwise be used again soon. I.e. these reads
would particularly benefit from using direct IO.
c) Even disregarding DIO, loosing the ability to do larger reads, as provided
by read streams, looses a fair bit of efficiency (just try doing a
pg_prewarm of a large relation with io_combine_limit=1 vs
io_combine_limit=1).
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2025-03-22 20:14:11 | Re: BitmapHeapScan streaming read user and prelim refactoring |
Previous Message | Alexander Lakhin | 2025-03-22 20:00:00 | Re: BitmapHeapScan streaming read user and prelim refactoring |