From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: maintenance_work_mem = 64kB doesn't work for vacuum |
Date: | 2025-03-11 21:14:52 |
Message-ID: | CAD21AoBAXf6XqJzOXSZzhfeNohz7od_LPsyPuzUAvaNiGjqo4w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Mar 10, 2025 at 2:53 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Mon, 10 Mar 2025 at 17:22, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > Regarding that patch, we need to note that the lpdead_items is a
> > counter that is not reset in the entire vacuum. Therefore, with
> > maintenance_work_mem = 64kB, once we collect at least one lpdead item,
> > we perform a cycle of index vacuuming and heap vacuuming for every
> > subsequent block even if they don't have a lpdead item. I think we
> > should use vacrel->dead_items_info->num_items instead.
>
> OK, I didn't study the code enough to realise that. My patch was only
> intended as an indication of what I thought. Please feel free to
> proceed with your own patch using the correct field.
>
> When playing with parallel vacuum, I also wondered if there should be
> some heuristic that avoids parallel vacuum unless the user
> specifically asked for it in the command when maintenance_work_mem is
> set to something far too low.
>
> Take the following case as an example:
> set maintenance_work_mem=64;
> create table aa(a int primary key, b int unique);
> insert into aa select a,a from generate_Series(1,1000000) a;
> delete from aa;
>
> -- try a vacuum with no parallelism
> vacuum (verbose, parallel 0) aa;
>
> system usage: CPU: user: 0.53 s, system: 0.00 s, elapsed: 0.57 s
>
> If I did the following instead:
>
> vacuum (verbose) aa;
>
> The vacuum goes parallel and it takes a very long time due to
> launching a parallel worker to do 1 page worth of tuples. I see the
> following message 4425 times
>
> INFO: launched 1 parallel vacuum worker for index vacuuming (planned: 1)
>
> and takes about 30 seconds to complete: system usage: CPU: user: 14.00
> s, system: 0.81 s, elapsed: 30.86 s
>
> Shouldn't the code in parallel_vacuum_compute_workers() try and pick a
> good value for the workers based on the available memory and table
> size when the user does not explicitly specify how many workers they
> want?
I think in your case the threshold of min_parallel_index_scan_size
didn't work well. Given that one worker is assigned to one index and
index vacuum time mostly depends on the index size, the index size
would be a good criterion to decide the parallel degree. For example,
even if the table has only one dead item, index vacuuming would take a
long time if indexes are large as we scan the whole indexes in common
cases (e.g. btree indexes), in which we would like to use parallel
index vacuuming. Also, even if the table has many dead items but its
indexes are small (e.g., expression indexes), it would be better not
to use parallel index vacuuming.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-03-11 21:21:42 | Re: Parallel heap vacuum |
Previous Message | Nathan Bossart | 2025-03-11 21:11:18 | Re: [PATCH] SVE popcount support |